base64 encoding produced by Python xmlrpc.client contains newlines

mats-swissphone commented 1 year ago

Hi. Great library! We'd like to use it for our internal projects.

We have stumbled upon an error when sending binary (bytes/base64) values via Python xmlrpc.client, which encodes base64 with newlines.

The following quickfix helped in our case:

diff --git a/dxr/src/values/ser_de.rs b/dxr/src/values/ser_de.rs
index 21c2466..26da001 100644
--- a/dxr/src/values/ser_de.rs
+++ b/dxr/src/values/ser_de.rs
@@ -216,7 +216,8 @@ pub(crate) mod value {
                             .map_err(de::Error::custom)
                     },
                     Field::Base64 => {
-                        let string: String = map.next_value()?;
+                        let mut string: String = map.next_value()?;
+                        string.retain(|c| c != '\n');
                         super::base64::from_str(&string)
                             .map(Value::base64)
                             .map_err(de::Error::custom)

A proper upstream fix would be much appreciated!

Hope this helps & Cheers

(Edit: fixed the link)

decathorpe commented 1 year ago

Thanks for the report! Compatibility with other implementations is important to me, so I'll try fix this ASAP and publish a new release.

I haven't ever used binary / base64 data with Python's xmlrpc module, so to make sure I correctly understand the problem: When encoding binary data as base64, the Python implementation inserts newlines after multiples of 76 characters - but no other whitespace, or carriage return characters? The RFC linked from the Python docs indicates that CRLF should used ... so, does the Python implementation not even follow the rules from the RFC it's linking to? :)

In this case, I agree, normalizing the input string by stripping newline characters (\n, and maybe \c for good measure) before attempting to decode would be a good idea.

mats-swissphone commented 1 year ago

Thanks for the quick response!

You are probably referring to the following in the RFC:

In particular, text line breaks must be converted into CRLF sequences prior to base64 encoding.

I believe this applies to encoding textual data as base64, because it says prior to encoding. So this does not apply in our case here.

The following paragraph however is important:

The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances.

As I interpret this, the decoder should therefore:

Either ignore line breaks (presumably \n and/or \r) and other whitespace (ASCII only, or even Unicode?),
or ignore any character that is not in the base64 alphabet (this is probably not such a good idea).

Python will indeed insert newlines \n every 76 output bytes: source, detailed docs.

So ignoring just \n would work for the Python client, but the RFC suggests to ignore more.

The decision is yours :)

decathorpe commented 1 year ago

Ok, that sounds like dropping \n, \r, and ASCII whitespace (i.e. ` and\t`) would be a good middle ground between "not handling line breaks at all" and "stripping everything that's not in the base64 alphabet" (since the latter would likely cause invalid input to become valid). I'll look into implementing that tomorrow.

decathorpe commented 1 year ago

I pushed a fix for this and released v0.6.1: https://crates.io/crates/dxr/0.6.1

mats-swissphone commented 1 year ago

@decathorpe

Thanks! Works.

ironthree / dxr

base64 encoding produced by Python xmlrpc.client contains newlines #17