tafia / quick-protobuf

A rust implementation of protobuf parser
MIT License
452 stars 87 forks source link

negative int32 sign-extended too much? #220

Closed sollyucko closed 1 year ago

sollyucko commented 2 years ago

-...i32 as u64 results in sign-extending all the way to 64 bits, which I assume is incorrect according to the protobuf spec. Changing it to as u32 as u64 should sign-extend to 32 bits and then zero-extend from there to 64 bits, resulting in what I assume is the correct behavior.

Demo: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=dc9845eb3a846f6658029234ab801478

fn main() {
    println!("{}", (-1i32) as u64);
    println!("{}", (-1i32) as u32 as u64);
}
18446744073709551615
4294967295
snproj commented 2 years ago

Hi; thanks for the PR! However, to the best of my own knowledge, negative numbers should always be 10-byte varints (which decodes to a 64 bit result). This is from generally looking online (such as here) as well as the official documentation (which I think now has rephrased the section on negative numbers after the previous links quoted it):

The intN types encode negative numbers as two's complement, which means that, as unsigned, 64-bit integers, they have their highest bit set. As a result, this means that all ten bytes must be used.

I've tested it with the following pseudo-fuzzing code snippet, which passes with the current code but doesn't with as u32 as u64.

Code ```rust #[test] fn test_get_size() { fn assert_get_size_equals_actual_msg_size(i: i32) { let mut message = TestTypesSingular::default(); message.double_field = Some(i as f64); message.float_field = Some(i as f32); message.int32_field = Some(i); message.int64_field = Some(i as i64); message.uint32_field = Some(i as u32); message.uint64_field = Some(i as u64); message.sint32_field = Some(i); message.sint64_field = Some(i as i64); message.fixed32_field = Some(i as u32); message.fixed64_field = Some(i as u64); message.sfixed32_field = Some(i); message.sfixed64_field = Some(i as i64); message.bool_field = Some(i % 2 == 0); message.string_field = Some(format!("body once told me {}", i).into()); message.bytes_field = Some(vec![33u8, 34, i as u8].into()); message.enum_field = Some(TestEnumDescriptor::BLUE); let get_size_res = message.get_size(); let mut serialized = Vec::new(); let mut writer = Writer::new(&mut serialized); message.write_message(&mut writer).unwrap(); assert_eq!(get_size_res, serialized.len()); } for i in (i32::MIN..i32::MAX).step_by((i32::MAX / 10000) as usize) { assert_get_size_equals_actual_msg_size(i); } for i in ((i32::MIN / 10000)..(i32::MAX / 10000)).step_by((713) as usize) { assert_get_size_equals_actual_msg_size(i); } for i in -100..100 { assert_get_size_equals_actual_msg_size(i); } } ```

Do let me know if I've missed anything / misunderstood your PR though!