spotify / snakebite

A pure python HDFS client
Apache License 2.0
855 stars 216 forks source link

Support hadoop.rpc.protection=privacy #185

Open dacjames opened 8 years ago

dacjames commented 8 years ago

Based on our testing, snakebite does not to function when RPC encryption (aka privacy mode, aka auth-conf) is enabled. More specifically, SASL connects fine and snakebite does not throw errors for simple operations, but fails to return any useful results (no items for ls, always True for test, etc.).

Investigation showed that protobuf was not deserializing any fields because the data was still encrypted when it was trying to read it. In most cases, this did not produce an error because the field was optional.

Is it possible to add support for RPC encryption?

wouterdebie commented 8 years ago

I personally have zero clue of kerberos or SASL. Maybe @bolkedebruin, who did the kerberos implementation knows?

bolkedebruin commented 8 years ago

encryption requires fundamental changes to snakebite as we then need to wrap/unwrap the protocol. In our environment we are not using it and that makes it a bit hard to test it. To be honest for me it is not very high on my priority list currently.

dacjames commented 8 years ago

Would you be interested in merging patches that added this feature if we worked on it?

wouterdebie commented 8 years ago

Absolutely!

// Wouter

Fatfingered on an iPhone

On Jan 27, 2016, at 20:49, Daniel Collins notifications@github.com wrote:

Would you be interested in merging patches that added this feature if we worked on it?

— Reply to this email directly or view it on GitHub.

tristanwietsma commented 8 years ago

@bolkedebruin, I'm seeking a little wisdom on this workflow...

From looking at the current implementation, the negotiation process is not encrypted. I'm guessing this wrap/unwrap only applies to a data payload since the auth part can be safely viewed over the wire, but I'm having a little trouble grokking where that encrypted payload is. Is it a field in the protobuf or an entire buffer?

cc @chenbensong

dacjames commented 8 years ago

@tristanwietsma based on our testing, the entire payload is encrypted. The negotiation process includes a Quality of Protection (qop) setting that you read to determine whether encryption (and/or a MAC for integrity protection) should be used. I believe the full details are described in this RFC.