A library to compile regex verification in circom. Explained on our blog post. You can use regex to specify how to parse an email in a ZK Email proof when defining a new patterm on the ZK Email SDK registry. Noir coming soon.
This library provides a compiler for regex that lets you specify public and private parts, then generates circom circuits that enable you to prove that
This is a Rust adaptation of the Python regex-to-circom work done by sampriti and yush_g, along with sorasue's and Shreyas + Bisht13's rewrite in Rust to support more characters. You can play with the old V1 compiler and DFA visualizations via our no-code tool zkregex.com. Note that zkregex.com does NOT support all syntax from the V2, only the highly restricted set of syntax from V1.
In addition to the original work, this library also supports the following features:
You can define a regex to be proved and its substring patterns to be revealed. Specifically, there are two ways to define them:
To understand the theory behind the regex circuit compiler, please checkout our main explanation post, or this older blog post. To understand how it ties into the original zk email work, you can also read the brief original zk-email blog post regex overview.
The regular expressions supported by our compiler version 2.1.1 are audited by zksecurity, and have the following limitations:
Note that all international characters are supported.
If you want to use this circuit in practice, we strongly recommend using AssertZero on the bytes before and after your match. This is because you likely have shift viaan unconstrained index passed in as the witnesss to represent the start of the regex match. Since that value can be arbitrarily manipulated, you need to manually constrain that there are no extra matches that can be used to exploit the circuit. You can see how we do this in zk-email here.
Install yarn v1, or run yarn set version classic
to set the version.
Also make sure that circom
is installed.
Clone the repo and install the dependencies:
yarn install
zk-regex
is a CLI to compile a user-defined regex to the corresponding regex circuit.
It provides two commands: raw
and decomposed
zk-regex decomposed -d <DECOMPOSED_REGEX_PATH> -c <CIRCOM_FILE_PATH> -t <TEMPLATE_NAME> -g <GEN_SUBSTRS (true/false)>
This command generates a regex circom from a decomposed regex definition.
For example, if you want to verify the regex of email was meant for @(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)+.
and reveal alphabets after @, you can define the decomposed regex as follows.
{
"parts":[
{
"is_public": false,
"regex_def": "email was meant for @"
},
{
"is_public": true,
"regex_def": "[a-z]+"
},
{
"is_public": false,
"regex_def": "."
}
]
}
Note that the is_public
field in the second part is true since it is a substring to be revealed.
You can generate its regex circom as follows.
./simple_regex_decomposed.json
.zk-regex decomposed -d ./simple_regex_decomposed.json -c ./simple_regex.circom -t SimpleRegex -g true
. It outputs a circom file at ./simple_regex.circom
that has a SimpleRegex
template.zk-regex raw -r <RAW_REGEX> -s <SUBSTRS_JSON_PATH> -c <CIRCOM_FILE_PATH> -t <TEMPLATE_NAME> -g <GEN_SUBSTRS (true/false)>
This command generates a regex circom from a raw string of the regex definition and a json file that defines state transitions in DFA to be revealed.
For example, to verify the regex 1=(a|b) (2=(b|c)+ )+d
and reveal its alphabets,
2->3
for the alphabets after 1=
, 6->7
and 7->7
for those after 2=
, and 8->9
for d
../simple_regex_substrs.json
that defines the state transitions. For example,
{
"transitions": [
[
[
2,
3
]
],
[
[
6,
7
],
[
7,
7
]
],
[
[
8,
9
]
]
]
}
zk-regex raw -r "1=(a|b) (2=(b|c)+ )+d" -s ./simple_regex_substrs.json -c ./simple_regex.circom -t SimpleRegex -g true
. It outputs a circom file at ./simple_regex.circom
that has a SimpleRegex
template.The generated circuit has
msg_bytes
: the number of characters for the input string.msg[msg_bytes]
: the input message to match againstout
: a bit flag to identify whether the substring of the input string matches with the defined regex.reveal(i)[msg_bytes]
: The masked version of msg[msg_bytes]
. Each character in msg[msg_bytes]
is turned to zero in reveal(i)[msg_bytes]
if it does not belong to the i-th substring pattern.For more examples in action, please checkout the test cases in the ./packages/circom/circuits/common
folder.
A package in ./packages/apis
provides nodejs/rust apis helpful to generate inputs of the regex circuits.
Welcome any questions, suggestions or PRs!
You will need to have bun installed:
curl -fsSL https://bun.sh/install | bash
yarn test
Use this bibtex citation.
@misc{zk-regex,
author = {Gupta, Aayush and Panda, Sampriti and Suegami, Sora},
title = {zk-regex},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/zkemail/zk-regex/}},
}
Some email providers put not only the sender's email address but also their username to the From field. ALthough its concrete formats differ among email providers, our FromAddrRegex template assumes that the email address appears at the end of the From field. If this assumption does not hold, i.e., the username appears after the email address, an adversary can output an arbitrary email address from that template by including a dummy email address in the username.