Something is wrong with TlsStream's Write implementation leading to data corruption

e00E commented 7 years ago

I noticed data corruption when sending data over https with reqwest and started investigating: https://github.com/seanmonstar/reqwest/issues/164

First I saw that sometimes TlsStream's encrypt method would never receive the last byte of my input. Then I looked at TlsStream's Write implementation at https://github.com/steffengy/schannel-rs/blob/master/src/tls_stream.rs#L798 and noticed the following behavior:

write is called with a buffer of length 106, this buffer has at its end the second to last byte of my input
write enters the if and calls encrypt on this buffer
write calls write_out and forwards the resulting WouldBlock
write is called twice with a buffer of length107, now also containing the very last byte of my input at its end
both times write does not call encrypt but calls write_out and forwards the resulting WouldBlock
write is called a third time again with that buffer, it again does not call encrypt but callswrite_outsuccessfully (which returnedOk(135)) and returns Ok(107)

I dont know what exactly is going wrong yet or how to fix it but this definitely looks like a bug inschannel-rs to me. TlsStream is receiving a buffer to write and responds that it has written the buffer while not having ever encrypted or sent the last byte of that buffer.

e00E commented 7 years ago

I think this comment in write describes a erroneous assumption:

if we have pending output data, it must have been because a previous attempt to send this data ran into an error. Specifically in the case of WouldBlock errors, we expect another call to write with the same data.

This is not guaranteed and caller has no way of knowing that, they just sees a standard Write (https://doc.rust-lang.org/std/io/trait.Write.html#tymethod.write) implementation. This is also what goes wrong in my example above. There is a WouldBlock error and in the next call the input buffer has changed, it has only changed but by one additional byte in this case but really it could be anything. Changing input buffers between calls seem to cause a disconnect of the contents of out_buf and the actual input the to write.

e00E commented 7 years ago

I think this rule from the Write trait is also violated A call to write represents at most one attempt to write to any wrapped object. by repeatedly writing to self.stream in a loop in write_out.

e00E commented 7 years ago

Also couldn't write_out enter an infinite loop if self.stream.write started returning 0.

sfackler commented 7 years ago

Why is more data being added to the buffer being passed to write in between calls?

e00E commented 7 years ago

I dont know. Reqwests uses hyper so probably there? My point was that it doesnt matter why more data is added because that is a valid thing to do if the caller wants to, unless Im understanding something?!

sfackler commented 7 years ago

Yeah - I think we just need to track the size of the pending write and change the requirement on Write that the buffer has to start with the data you initially called write with.

steffengy commented 7 years ago

@sfackler

and change the requirement on Write that the buffer has to start with the data you initially called write with

Is it realistic, that all crates using this will comply with that?

How does openssl handle that? Just throw away the outbuf & reencrypt? (I guess this is for performance-reasons, to prevent several reencryptions on several WouldBlocks)

e00E commented 7 years ago

I personally strongly believe that it is not a good idea to change the meaning of trait methods. It will be a thing people using the crate stumble on because a trait doesnt do what they think it does and you would need to inform everyone using the crate already of this. Instead you could add a different method like performant_read which does what you propose, and then people can opt in to use that one. Maybe a good way to implement that new method would be to return the encrypted data in case of WouldBlock so it does not go to waste and make the method take something like an

Enum {
    new_data(&[u8]),
    pre_encrypted_data(&[u8])
}

This way the caller can store the encrypted data on WouldBlock and give it back when they try again. I believe that is in line with this api guideline.

sfackler commented 7 years ago

OpenSSL has the same requirement - in fact, it by default checks that the address of the buffer you pass in is consistent across all calls as a quick heuristic.

sfackler commented 7 years ago

@e00E I don't really understand what the issue is here - in what context would people be changing the data they're writing after a WouldBlock?

steffengy commented 7 years ago

To rephrase: what is it doing to prevent that this doesn't happen when using it? Simply checking if the passed address & length match?

(Since I cannot currently think of a usecase where you would want to retry a write after WOULDBLOCK with less or other data in that range [..last_end], doing it should be safe I can understand the issue here, that you might attempt to write MORE data, if available)

sfackler commented 7 years ago

Yeah - check the description of SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER here: https://www.openssl.org/docs/man1.0.2/ssl/SSL_CTX_set_mode.html.

We actually turn that off in rust-openssl.

steffengy commented 7 years ago

We actually turn that off in rust-openssl

Where/When do you turn it off, I can only spot turning SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER in connector.rs? (Assuming this is a typo, but just making sure I'm not missing somehing)

sfackler commented 7 years ago

Yeah, ACCEPT_MOVING_WRITE_BUFFER is the toggle for the behavior.

steffengy commented 7 years ago

Okay.

I still don't see how this issue wouldn't apply to rust-openssl/openssl. OpenSSL doesn't seem to check the length when ACCEPT_MOVING_WRITE_BUFFER is passed, so would this issue also exist there, or am I just missing something?

sfackler commented 7 years ago

I'm not sure about the length, actually. It may track it internally or have the same issue seen here.

steffengy commented 7 years ago

https://github.com/openssl/openssl/blob/0e1e4045c469f03294e33c0344d882e71dbd0d07/ssl/record/rec_layer_s3.c#L1065

This sounds to me like it would error, when the length has changed? I'm still fairly conflicted on what a good approach here is.

sfackler commented 7 years ago

It looks like that would allow longer buffers in later calls, just not shorter ones.

steffengy / schannel-rs

Something is wrong with TlsStream's Write implementation leading to data corruption #34