base85 zero padding issue

TLINDEN commented 10 years ago

The function b85encode() appends a number of \0s to the input in order to make it divisable by 4. However, b85decode() doesn't remove the \0s (as I understand it). This patch fixes it:

diff --git a/pbp/utils.py b/pbp/utils.py
index e3d1106..37d12a6 100755
--- a/pbp/utils.py
+++ b/pbp/utils.py
@@ -89,7 +89,14 @@ def b85decode(text):
     if cl:
         out = out[:-(5 - cl)]

-    return out
+    # Unpad previously zero padded input, if any
+    unpadlen = len(out)
+    for i in reversed(range(unpadlen)):
+        if out[i] != '\0' and i < unpadlen:
+            unpadlen = i + 1
+            break
+
+    return out[:unpadlen]

 _MCL_CURRENT = 1
 _MCL_FUTURE = 2

I'm not putting it into a pull request, because I seem to be unable to make separate pull requests. Git just adds all subsequent commits I make to the first pull request which I pushed earlier. Since I'm not sure if this patch is ok, I'm posting just the patch.

Tom

TLINDEN commented 10 years ago

Besides, this padding scheme is wrong anyway. Because if you do it that way you can't tell which zeroes of the input are intended. Let's say someone wants to encode something which has a couple of zeroes at the end. Then the padding in b85encode() would add more zeroes and the patch above would then remove ALL zeroes, which leads to invalid output.

Instead the input data should have a leading int which contains the number of added zeroes (like I do it with z85 in pcp), like:

count | data | zeroes

Tom

stef commented 10 years ago

how about pkcs#7 http://tools.ietf.org/html/rfc5652#section-6.3 for padding?

TLINDEN commented 10 years ago

Ok, I read the section in the rfc, but I'm not sure if it helps (or I misunderstand it). Let me recap: according to the rfc every input is padded with blocksize - (inputlen % blocksize), which is what already happens in utils.py, I think. In the case here, blocksize is 4 (line 30+31 in utils.py), and it appends the number or zeroes to make the input inputlen % 4 == 0.

So, but the rfc doesn't state, how the decryptor knows how many octets of the result are padding.

Maybe the issue is pure artificial anyway since nacl crypto or sign operations don't return data containing zeroes (while being valid). So, perhaps just removing all zeroes might be ok. I'm not sure...

Tom

PS: the same applies to z85 as well, should you decide to switch using it.

stef commented 10 years ago

integrated your fine patch.thx!

stef commented 10 years ago

moved over to z85 in https://github.com/stef/pbp/commit/9fa45f6a108ba910f41e863405c5527af8d70e84

stef / pbp

base85 zero padding issue #10