rust-num / num-bigint

Big integer types for Rust
Apache License 2.0
544 stars 188 forks source link

to_bytes_le on BigUInt strips trailing zero(es) #201

Open aknarts opened 3 years ago

aknarts commented 3 years ago

Found a strange edge case with one number I was working with, this is a binary saved sha1 hash for those curious and is throwing off everything I am doing. Providing a python3 example code to show what the value should be.

Basically this should result in vector of length 20 but ends up being length 19 because of the last element being 0 I suspect this line being the issue https://github.com/rust-num/num-bigint/blob/master/src/biguint/convert.rs#L598. I have no inner knowledge how you store your data so I have no idea how and why the check is done this way.

Reproduction

let num = BigUint::parse_bytes(b"4888659412758205060988492900023565086307197678", 10).unwrap();
println!("{:?}", num.to_bytes_le())

[238, 34, 129, 137, 3, 17, 254, 16, 242, 218, 149, 34, 154, 39, 150, 206, 14, 55, 219]

Working example in python

(4888659412758205060988492900023565086307197678).to_bytes(160//8, 'little')

b'\xee"\x81\x89\x03\x11\xfe\x10\xf2\xda\x95"\x9a\'\x96\xce\x0e7\xdb\x00'

cuviper commented 3 years ago

BigUint always trims zeros from the most-significant digits, both for the internal representation and for converted output. One reason to justify this on conversions is that we don't want different output on targets with different internal digit sizes -- such zero-padding may be different for 32 or 64-bit digits. For your 160-bit hash, that will internally have either 5 32-bit digits (exactly 160 bits) or 3 64-bit digits (192 bits) -- so if we didn't trim zeros, you would get 24 bytes out of the latter with 5 zeros.

Python can even tell you that your number's bit_length() is 152, which is exactly 19 bytes. It only gave you 20 because you explicitly requested so with 160//8, but it's also happy if you try 19 there.

You can pad your Rust byte vector to the desired size with bytes.resize(160 / 8, 0).

aknarts commented 3 years ago

Thanks for the quick reply. I did end up fixing it by padding the vector just as you said. This just caught me off guard because I could not find any mention of this anywhere in the docs, though I am not sure where I would put this. If this is expected behavior feel free to close the issue.

benma commented 3 years ago

It would be great if this library added the functions to_bytes_le_padded() and to_bytes_be_padded() which takes a length argument and pads the vector with zeroes. There are different options for what to do if the vector is bigger than the requested size, but any of those would be helpful so one does not have to manually pad the output.