Open vlmutolo opened 3 years ago
I'll take a closer look at this ASAP, but for now I very much agree with leaving AsyncRead
/AsyncWrite
as an aside.
AEAD example
This looks very much like the high-level API to me at this point (except of course for the traits). Would we just add support for Read
/Write
to simply supports these traits in the API of AEAD? Don't we want to consider things like the streaming API for AEAD, which could provide the same benefits when encrypting files as the hash example does?
Hash example
The approach for using the streaming interface seems good to me. I am wondering though, if you intend to have update()
take a Read
argument or if you want to only provide a from_reader
method, which automatically sets the buffer size, BLAKE2b output size etc? In the latter case, incremental hashing of files for example, would happen transparently to the user. However, some users might want to decide buffer size themselves, which is only possible with the former.
Don't we want to consider things like the streaming API for AEAD, which could provide the same benefits when encrypting files as the hash example does?
Yes, I think this would be good to support. I'm unclear, though, how to directly convert between the streaming API and the Read
and Write
traits. Read
and Write
both work on streams of bytes broken up into chunks of indeterminate size, while it seems like the AEAD streaming API relies on having precisely sized and separated messages to encode/decode. Is there a standard libsodium serialization of these messages into a byte stream?
I am wondering … if you want to only provide a from_reader method, which automatically sets the buffer size, BLAKE2b output size etc?… users might want to decide buffer size themselves
For the output size, this isn't something that Orion currently exposes in the safe API. And if users want to set it, then they can grab the Blake2b
type directly and use its Write
implementation. The from_reader
function would basically just be a safe wrapper around creating a Blake2b
type with sane parameters, callingBlake2b
's Write
impl and calling the std::io::copy
function.
For the "buffer", I'm not sure which buffer you mean. If users want to buffer the reader, they can use a BufReader
and then pass that to the from_reader
function.
For the "buffer", I'm not sure which buffer you mean. If users want to buffer the reader, they can use a BufReader and then pass that to the from_reader function.
I think I understand now, I think I misunderstood the purpose of from_reader()
initially.
Is there a standard libsodium serialization of these messages into a byte stream?
I see your point, in that this can be tricky. Especially since streaming write directly to an output, and has no internal state. Perhaps, in that case each call to seal_chunk()
/open_chunk()
would just need a reader and writer as the current API suggestion is for non-streaming AEAD.
I'll have to think about this one a bit more. If we don't find a solution to this, I definitely think that we should start with adding the integration to the hashing primitives and non-streaming AEAD, and then see how that turns out. WDYT?
I was also thinking that perhaps the naming of the functions for non-streaming AEAD could be made a bit less generic. How about sealing_read()
/opening_write()
or seal_read()
/seal_write()
?
In any case, I really like the idea behind the hashing primitives so far. This should definitely go in. Regarding another io
feature, I don't think this is necessary. If we just gate it behind safe-api
that should be fine, just like we do with the std:error:Error
trait.
each call to
seal_chunk()
/open_chunk()
would just need a reader and writer
Oh, interesting. Yeah this probably would work. Instead of having a single call to e.g. sealing_read()
handle all the individual streaming chunks, the user would call sealing_read
once per chunk. That sounds good to me.
If we don't find a solution to this, I definitely think that we should start with adding the integration to the hashing primitives and non-streaming AEAD
Yes, agreed. Hashing is pretty straightforward here. Basically just consume any reader and output the digest. Though I do see where the user may want to control the size of the buffer we read into (which I see now is probably what you meant above). Maybe we could do something like internally allocate a Vec<u8>
to read into and grow it dynamically until reads don't fill the whole buffer. Kind of like TCP slow-start.
How about
sealing_read()
/opening_write()
orseal_read()
/seal_write()
?
I like these names. Not sure about sealing_
vs seal_
. They're both good. I think I like sealing_read
and sealing_write
better, but not by much. That's what I'll go with unless you want to switch them.
For the hash names, are you still good with from_reader
or should we switch to something again less generic? Maybe digest_from_reader
or consume_to_digest
? I think digesting_read
sounds a bit odd haha.
Though I do see where the user may want to control the size of the buffer we read into (which I see now is probably what you meant above). Maybe we could do something like internally allocate a Vec
to read into and grow it dynamically until reads don't fill the whole buffer. Kind of like TCP slow-start.
Right now I'm kind of lost in all these buffers and read/write directions/sources tbh. Do you perhaps have a small example?
That's what I'll go with unless you want to switch them.
Fine by me. The only downside I see is the small added consistency with the existing method naming of seal
/open
when doing seal_read
/open_write
instead. I'm open to both and will let you decide.
For the hash names, are you still good with from_reader or should we switch to something again less generic?
Good point. I very much like the digest_from_reader
name. Plays well with the current digest
function IMO.
I think digesting_read sounds a bit odd haha.
Almost tempting haha.
Do you perhaps have a small example?
Here's what the API might look like.
let reader = File::open("file.txt")?;
let digest = orion::hash::digest_from_reader(reader)?;
Nice and easy. Internal to the digest_from_reader
function, we could do something like the following to achieve somewhat of an "ideal" buffer size.
fn digest_from_reader(mut reader: impl Read) -> Result<Digest> {
// Start with a buffer of 1KB (not sure what this initial value should be)
let mut read_buffer = vec![0; 1024];
loop {
let bytes_read: usize = reader.read(&mut read_buffer)?;
if bytes_read == 0 { break; }
if bytes_read == read_buffer.len() {
// Double the size of the buffer if we filled it up last time.
read_buffer.extend(std::iter::repeat(0).take(read_buffer.len()));
}
// Hash the slice given by: &read_buffer[..bytes_read]
}
}
That is a very straight-forward API indeed. Thanks for the example! I think this is a good approach. We can figure out a good buffer size and document it clearly. Assuming users of digest_from_reader()
typically pass readers will large amounts of data (an assumption which should probably also be documented if taken), we can take a buffer size of say of 128 or 256 KiB? The reason behind this suggestion stems mostly from the fact that these are the two largest sizes used in benchmarks.
Edit: If we take such large buffers, we probably don't want to resize them dynamically though.
Allocating a buffer of 256 KiB seems reasonable. Very few applications would be affected by that use of memory, and those that are can use the hazardous module and Blake2b
's implementation of Read
. And no, we probably wouldn't want to keep growing that buffer.
implementation of
Read
Just making sure, but this is the Write
impl right? I saw no Read
impl in the draft PR AFAIR.
Regardless, sounds like a good plan.
Yes, exactly. I meant Write
.
Summary
The Rust ecosystem heavily integrates with
std
'sRead
andWrite
traits. It would be helpful to interoperate with these traits, especially in the context of theaead
andhash
modules.Example
Hashing
AEAD
I'm definitely open to suggestions on the API described here. Particularly, I'm not sure if we'd want convenience wrappers around the
*_copy
functions for the AEAD case. The copy functions are fairly general, and will allow for people to allocate buffers how they want. But there's probably a case to be made to provide a simpler type that just outputs aVec
of encrypted/decrypted data.In any case, the real advantage of implementing these APIs is that someone could do, for example, the following:
And it would work as expected. The big deal here is that large files should be read and hashed in pieces, which currently requires reaching into the
hazardous
module and using the "streaming interface" (theupdate
method). This looks something like the following.So it's longer, has a little more room for user error, and in general we probably just want to support the existing IO traits.
I already started the implementation, and I'll post a draft PR soon.
Also, I considered making
AsyncRead
andAsyncWrite
part of this issue/PR, but that seems like it should be its own thing. There's talk about integrating those traits into the standard library some time soon, so maybe we should hold off until then.