Closed zaccherinij closed 9 months ago
Could you provide me the time frame,I will be working on it after two weeks; will that be okay?
Could you provide me the time frame,I will be working on it after two weeks; will that be okay?
Deadline for submission is 17th December (you could see it with the associated milestone
I have a few questions:
Note that the above functions will not literally trim blocks off of the provided FheAsciiString but rather will zero out the correct blocks to make the null terminated string the trimmed version as appropriate.
this makes sense for me with trim_end, but I'm not sure about trim or trim_start... is it ok to just treat 0-byte as empty anywhere in the string (rather than just at its end / null-termination), so ClientKey's decryption would just filter out these 0s to get a string? Or should it still respect C-style and what to do in that case? Should it use a special value to mark empty places (e.g. 256 outside of u8's range) before 0-termination?
@aquint-zama one follow-up question depending on the answer to 4. about function outputs:
repeat
: how should the encrypted number of repetitions be handled? Again, based on my newbie understanding, one cannot know with a ServerKey whether a given string should be repeated 1-time or 1000000000-times... is it in that situation ok to return a "lazy" repeated string iterator (which will most likely return two encrypted values: a flag whether the number of repetitions was reached and the repeated encrypted string constructed up to that point)? ClientKey can then get the final string by iterating and checking the decrypted flags?hello @tomtau
Thanks @IceTDrinker . And for "replace" / "replacen" functions, should the "to" argument be FheString or a plain string/slice?
I believe both are doable, with a complication/performance degradation if "to" is also encrypted
@aquint-zama Some You wrote about the main : "This executable will encrypt the first string and run all the available APIs on the input string using the pattern when necessary" Should the pattern be encrypted or not, or both ? This would change the speed of the computation so it is important every one does the same for fairness. Thanks
Hello @Lcressot submissions providing both clear and encrypted patterns would be rated better than a solution with only clear patterns, if some cases look impossible (see encrypted n for repeatn for large integer types which does not look like it can realistically work beyond u8 (and even then will be most likely very slow)) then obviously we would take that into account and in that case the relative speed/technical solutions would be evaluated. If you manage to do something clever where other submissions just ignore the difficulty by marking it as Ā« wonāt implement Ā» then the submission proposing a solution will likely be rated better
@IceTDrinker Ok great thanks, so we need to measure and display the computation time for each operation and not just globally
Absolutely @Lcressot
Hi, should the encrypted pattern also have a padding ? I think using padding to the pattern would lead to a huge additional complexity to some algorithms. Maybe we can implement both padding and no padding functions.
This is a good question, in the general case the pattern length can leak some amount of information about what users would be searching for.
With that said the performance could be abysmal indeed, potential ideas, we will be discussing on our end at Zama but if you have some inputs on that please feel free to share :
Hello, thanks for answering. My current opinion on this:
This is a good question, in the general case the pattern length can leak some amount of information about what users would be searching for.
With that said the performance could be abysmal indeed, potential ideas, we will be discussing on our end at Zama but if you have some inputs on that please feel free to share :
- also have the length of the string as an encrypted value in the FheString struct (we can make it a 32 bits integer), not sure if that helps or notNot sure the encrypted length would help because because we would have to deal with all possible length value as well. The padding zeros already contain de encrypted length information in my opinion.
- have an enum in the FheString, to know if the encrypted string is padded with 0s at the end or not allowing to match on both the encrypted string and the pattern to use better algorithms if they exist in the various implementations (less error prone for the users as this enum would be set during encryption)I used a boolean has_padding that is set at encryption. I consider it a trade-off between security and performance for the user to decide.
- have different functions to manage all the cases, I'm not a fan of this option as it would clutter the API with a lot of function variations in addition to being error prone for the users
I think the algorithms could integrate an if has_padding for the pattern, maybe panic if there is padding and they donāt work in this case, and keep the same for the string (wether it has padding or not) considering it has an ending /0 in all cases.
LoĆÆcĀ
So after discussion @Lcressot and @tomtau it is acceptable to have a flag indicating if the string is padded with zeros at the end or if it has a single 0, meaning the string length can be deduced by the vector it stores.
Having better algorithms for when there is no padding is accepted, submissions which cover the most difficult cases (e.g. padded input and padded pattern) would likely be rated better as the other algorithms should be more straightforward
Instead of duplicating all functions for both types FheString and String (encrypted or clear pattern), can we make template function that take FheString
having generic functions would be a nice touch I would say, they can then dispatch to the proper algorithm
[ADDED 23.10.2023] A null character "\0" may mark the early end of a string but is not required (as was first asked in the bounty) as it can make some algorithms more complicated and slower.
Regarding this note: Is it OK if the FheString
has a cleartext flag isZeroPadded
(as suggested in https://github.com/zama-ai/bounty-program/issues/80#issuecomment-1765966098) to signal whether it is padded with zeros or not? Without a cleartext flag, I don't think we can hope for much algorithmic improvement, as we can't branch on encrypted values. @IceTDrinker
@matthiasgeihs a flag indicating the padding status is fine yes, or an enum works as well
strip_prefix
and strip_suffix
return the string without the matched pattern wrapped in Some, but if thereās no match they return None. Should we just return the string without prefix/suffix (if thereās match) and the original string (if there isnāt)? If we returned special characters representing None we would probably need to decrypt the output to handle a potential None before continuing with subsequent FHE operations.
In my opinion you can return the result along with a boolean telling wether the prefix pattern was found or not
From reading the std docs to emulate the None you have two choices, the boolean flag and returning the original string, doing a cmux on the first character setting it to the null byte (early string termination means empty string here) with said boolean flag while still encrypted, given one probably wants to chain operations, that latter option can be interesting but is slightly more costly
Do you mean treating the None as a empty string?
Yes thatās a possibility, maybe there is something Iām missing but it does not seem like such a bad idea from a usability perspective
How would one distinguish between Some("")
(a result of when prefix/suffix matched the entire string input) and None
? Or is it ok to divert from the std
behaviour?
Or two variants of that function, one matching the std 1:1 with a boolean flag, another returning an empty string
I really can't say both have advantages and disadvantages, as long as this is documented/explained I don't think it would make or break a bounty
I would say operating with str functions over a None doesnāt make much sense and users would probably want to work with either the string with pattern removed or the original string in the None case. Returning the boolean flag along with the correct string makes more sense to me. Or if they only wanted to get that boolean they can use starts_with
and ends_with
.
sounds perfectly reasonable
A clarification regarding the cli arguments main.rs
should expect. From
The code for a command line executable that will take an input string to be encrypted and a pattern for string functions which require it. This executable will encrypt the first string and run all the available APIs on the input string using the pattern when necessary. The results will be compared to the clear APIs provided by rustās std::str, the ouptuts will be nicely formatted for the user to see that the results match (if there are errors then the bounty would not be considered valid) with timing information for the FHE version.
It looks like only 2 arguments should be provided. But to cover all cases we also need an n
, from
and to
off the top of my head for the specific algorithms that require them (e.g. repeat requires n and replace requires from and to). It makes sense to include those as well in the required cli arguments of main.rs but asking here for clarification/sanity check. Thank you in advance.
Of course, any missing arguments are ok to take on the command line, as we did not implement the bounty ourselves there are some stuff we have not covered/thought about this thread is here for that !
Another clarification regarding the added note.
A null character "\0" may mark the early end of a string but is not required (as was first asked in the bounty) as it can make some algorithms more complicated and slower. The option to have encrypted strings with a padding made with 0s is still required (to leave the option to the user to obfuscate the string's length), differentiating between the two kind of strings (padded or not) can and should be used to chose better algorithms when possible.
I am unsure whether both options are required or not. The first sentence suggests that padded strings are not required as they may make some algorithms slower. The second one suggests that a user should be able to choose whether to use padded strings or not. In my solution for example all algorithms are with padded strings except for 2. Is the point to make it explicitly clear to the user which version of strings they should use? Or should we implement both options for all algorithms? Thanks!
@MakisChristou the user must have the option to pad the input string, algorithms can use \0 to indicate the early end of a string, however a string is valid even if it does not have a \0
So we can have algorithms that don't work with padded strings, but I presume they will be less favourable than their padded counterpart?
What matters at the end of the day for padded inputs and non padded inputs will be the runtime, so having an algorithm working on padded strings will most likely be more generic but slower, meaning you likely want both variants (and one likely is a derivative of the other with stronger assumptions/invariants)
As for the submission do we just make our repo public after we add the link? Also I have cloned the state of tfhe-rs as it was 2 or so months ago and added my tfhe_strings
folder. Is that the expectation?
As for the submission do we just make our repo public after we add the link? Also I have cloned the state of tfhe-rs as it was 2 or so months ago and added my
tfhe_strings
folder. Is that the expectation?
Forking is indeed the way to go, you could keep the repository private until the deadline (17th of December) but still submit it through the submission form. Then, on the 18th of December you turn it public.
@MakisChristou does your code work with tfhe 0.4.1 ?
@IceTDrinker Its currently using version 0.4.0 but I could upgrade it. The thing is, last I checked I could not fork and then private the repo with a free Github account. @aquint-zama.
that's a good point, 0.4 is fine I think you should be able to upgrade to 0.4.1 easily.
For the repo forking and private setting it is indeed a problem, so I guess you can fork for the moment as a public repo and have something on your work/dev machine without pushing to git yet if you are afraid your code will leak/be used by others
Hello @MakisChristou so for the version we would recommend rebasing on top of the main commit https://github.com/zama-ai/tfhe-rs/commit/ad41fdf5a5060c0a981cd0c35bf998feafe68e02 (or later) the reason being that main has had improvements in low level performance and some other algorithms which should improve the runtime of your solution
Cheers
So should we use the BooleanBlock
instead of BaseRadixCiphertext<Ciphertext>
? This is what I was currently using as an inner to the FheAsciiChar
implementation.
So should we use the
BooleanBlock
instead ofBaseRadixCiphertext<Ciphertext>
? This is what I was currently using as an inner to theFheAsciiChar
implementation.
@MakisChristou No BooleanBlock is just a type we introduced that you can easily convert to Radix, the new type is to convey that some values encode a boolean value (i.e. a 0 or a 1) and should make some API usage less error prone, I think some functions may have had optimizations linked to BooleanBlock but don't take my word for it.
@MakisChristou Though for some functions if it makes sense to return a Boolean then feel free to return a BooleanBlock !
I see, is the convertion one way or 2 way? Like can I convert a BaseRadixCiphertext<Ciphertext>
to a BooleanBlock
? My if_then_else
implementation requires it for example unless I will make changes to the API.
So you should be able to convert the boolean block to a radix with into_radix
, you can use convert
on a RadixCiphertext to a BooleanBlock, this will do the C-like conversion where
bool my_bool = some_int != 0;
for further support please use the FHE.org Discord channel for TFHE-rs :)
So this has been somewhat discussed before but wanted to make sure I am getting the requirements right. For example in many functions like repeat
or trim_start
we have 2 options after applying the algorithm in the padded case. Either shuffle zeroes at the end of the string or carefully place some special character such that the client can trivially get the correct result at decryption time. Just wanted to make sure that option 2 is considered valid since it's a lot faster than the former.
For example we can use a non ascii byte as a placeholder for bytes to be removed by the client. The resulting string will be the valid unpadded result.
well the special character of option 2 is a \0
right ?
well the special character of option 2 is a
\0
right ?
Not nessesarily. Lets say we run repeat on āhello/0ā 2 times. Instead of for example having āhello/0hello/0ā and applying shuffling to get āhellohello/0/0ā we can do āhelloxhelloxā where x can be 255u8 or something out of the ASCII range. Then the client can trivially remove the 255u8s getting āhellohelloā. In my code /0 marks the end of the string whereas this special char can mark āremove this byte on the client sideā.
can you call another of the API on the string with the special character and have a correct result ? š¤ feels like a \0
with extra steps as other algorithms will need to handle that anyways ? Maybe I'm wrong there
Winners
š„ 1st place: A submission by JoseSK999 š„ 2nd place A submission by Tomtau š„ 3rd place : A submission by M-Bln
Overview
This TFHE-rs bounty looks to provide a set of APIs working over encrypted string reproducing APIs from the rust std library str type (see documentation at https://doc.rust-lang.org/std/primitive.str.html)
This document will specify the expected structure of the example that you will produce and specify various constraints on the types that will be introduced and the APIs that are expected on the primitives.
How to participate?
1ļøā£ Register here. 2ļøā£ When ready, submit your code here. šļø Submission deadline: December 17, 2023.
Description
String encoding
The input string encoding is expected to be ASCII.
To avoid leaking the length of the input string we may want to encrypt zeros after the string actual content, to easily mark the end of the string we will make the
FheString
null terminated, i.e., it must always end by the value0u8
encrypted, like C Strings are null terminated.Functions to implement
Your submission should at least implement, the following method for an encrypted string:
contains
with clear / encrypted patternends_with
with clear pattern / encrypted patterneq_ignore_case
find
with clear pattern / encrypted patternis_empty
len
repeat
with clear / encrypted number of repetitionsreplace
with clear pattern / encrypted patternreplacen
with clear pattern / encrypted patternrfind
with clear pattern / encrypted patternrsplit
with clear pattern / encrypted patternrsplit_once
with clear pattern / encrypted patternrsplitn
with clear pattern / encrypted patternrsplit_terminator
with clear pattern / encrypted patternsplit
with clear pattern / encrypted patternsplit_ascii_whitespace
split_inclusive
with clear pattern / encrypted patternsplit_terminator
with clear pattern / encrypted patternsplitn
with clear pattern / encrypted patternstarts_with
with clear pattern / encrypted patternstrip_prefix
with clear pattern / encrypted patternstrip_suffix
with clear pattern / encrypted patternto_lowercase
to_uppercase
trim
trim_end
trim_start
+
(concatenation)>=
,<=
,!=
,==
API to use for the bounty
This bounty should make use of the integer API from TFHE-rs, more precisely we expect you to use
RadixCiphertexts
with the default parallelized functions available in the crate.Structure of the example directory
You will put your code in the directory tfhe/src/examples/fhe_strings
This directory will contain the following:
main.rs:
The code for a command line executable that will take an input string to be encrypted and a pattern for string functions which require it. This executable will encrypt the first string and run all the available APIs on the input string using the pattern when necessary. The results will be compared to the clear APIs provided by rustās std::str, the ouptuts will be nicely formatted for the user to see that the results match (if there are errors then the bounty would not be considered valid) with timing information for the FHE version.
Use clap (the version pinned in TFHE-rs) to build the command line.
ciphertext.rs:
This module will contain:
FheAsciiChar
: a wrapper type that will hold aRadixCiphertext
from integer which must be able to store at least 8 bits of data to be able to fit a single ASCII char;FheString
: a wrapper type around aVec<FheAsciiChar>
, the lastFheAsciiChar
of the string always encrypts a 0u8, it is possible to have 0u8 earlier than the last char, this would allow the user to hide the actual length of the string that is encrypted, accessors should be made to be able to iterate easily on the inner vec both mutably and immutably, do not provide a from_blocks primitives as it would be easy to misuse, anew
function should be enough to construct the type at encryption time with a client key, see for exampletfhe/src/integer/ciphertext/mod.rs
to see how IntegerRadixCiphertext
are built to give access to their content for use by algorithms.client_key.rs:
Provide a ClientKey type that can be built from
shortint
parameters (that are also used by the integer API) or from anIntegerClientKey
directly, realistically this ClientKey type will wrap the IntegerClientKey and provide primitives to encrypt a rust str (validating before encryption that the str is a valid ASCII str), decrypt it in a fresh String. There should be an encryption primitive that allows the user to specify how many āpaddingā encryption of zeros should be appended after the 0-terminated string that will be encrypted from the provided input string.ClientKey must derive
serde::Serialize
,serde::Deserialize
andClone
.directory server_key:
mod.rs: Provide a ServerKey that wraps an Integer ServerKey that can be built from a ClientKey from client_key.rs or directly from an Integer ServerKey.
ServerKey must derive
serde::Serialize
,serde::Deserialize
andClone
.Create files for families of functions that share similar algorithmic needs, e.g., IF IT MAKES SENSE (here we are speculating, this is just an example supposing similar functions will require similar algorithms)
split.rs
Will contain an
impl
Block for the aboveServerKey
dedicated to split functions likersplit
rsplit_terminator
split
split_inclusive
split_terminator
and all relevant helper functions. If some functions are needed everywhere then those functions can be put in the mod.rs file from the directory.
trim.rs
Will contain an
impl
Block for the aboveServerKey
dedicated to trim functions liketrim
trim_end
trim_start
Note that the above functions will not literally trim blocks off of the provided
FheAsciiString
but rather will zero out the correct blocks to make the null terminated string the trimmed version as appropriate.Also, it could make sense to separate functions depending on the type of the pattern (i.e., encrypted or clear) in separate files.
Other requirements
Each function should have a docstring describing what it does (those can be adapted from the rust std docs) with a doctest demonstrating the usage on simple hard coded cases. We expect standard algorithms if it makes sense (with links to the papers describing them or publicly available content on the web like a Wikipedia page for example), if some tricks are used proper comments are expected to be provided in the code.
A Makefile target (see Makefile for the template of our targets) for running the tests of the example is expected. A command to run the binary easily from the terminal is also requested. The code must compile on the latest stable rust. No #[allow(clippy::...)] are authorized unless absolutely necessary but that should never be required. The code must pass the make pcc (pre commit check) available from our Makefile, if you get errors then the code needs to be fixed to satisfy the pcc checks. Good luck!
Reward
š„Best submission: up to ā¬10,000.
To be considered best submission, a solution must be efficient, effective and demonstrate a deep understanding of the core problem. Alongside the technical correctness, it should also be submitted with a clean code, clear explanations and a complete documentation.
š„Second-best submission: up to ā¬3,500.
For a solution to be considered the second best submission, it should be both efficient and effective. The code should be neat and readable, while its documentation might not be as exhaustive as the best submission, it should cover the key aspects of the solution.
š„Third-best submission: up to ā¬1,500.
The third best submission is one that presents a solution that effectively tackles the challenge at hand, even if it may have certain areas of improvement in terms of efficiency or depth of understanding. Documentation should be present, covering the essential components of the solution.
Reward amounts are decided based on code quality and speed performance on a m6i.metal AWS server.
Related links and references
Submission
Apply directly to this bounty by opening an application here.
Questions?
Do you have a specific question about this bounty? Join the live conversation on the FHE.org discord server here. You can also send us an email at: bounty@zama.ai