zefchain / serde-reflection

Rust libraries and tools to help with interoperability and testing of serialization formats based on Serde.
Apache License 2.0
138 stars 26 forks source link

[Bug] Bincode deserialization of tuples containing enums is broken in c++ #31

Closed weiznich closed 1 year ago

weiznich commented 1 year ago

🐛 Bug

To reproduce

Code snippet to reproduce


#[derive(Clone, Debug, Eq, PartialEq, PartialOrd, Hash, Serialize, Deserialize)]
pub enum Key {
    Id,
    Custom(String),
}

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum Value {
    NotSet,
    Integer(i32),
    String(String),
}

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct MyStruct {
    pub a: (Key, Value),
    pub o: Key,
}

fn make_attribute() -> MyStruct {
    MyStruct {
        a: (Key::Custom("abc".into()), Value::String("bar".into())),
        o: Key::Id,
    }
}

#[no_mangle]
pub unsafe extern "C" fn test_in(data: *const u8, len: usize) {
    if !data.is_null() {
        let slice = std::slice::from_raw_parts(data, len);
        let data = bincode::deserialize::<MyStruct>(slice).unwrap();
        assert_eq!(data, make_attribute());
    }
}

#[no_mangle]
pub unsafe extern "C" fn test_out(data: *mut *mut u8) -> usize {
    let p = make_attribute();

    let binary = bincode::serialize(&p).unwrap();
    let b = binary.into_boxed_slice();
    let len = b.len();
    let out: &mut *mut u8 = &mut *data;
    *out = Box::into_raw(b) as *mut u8;
    len
}
#include <iostream>
#include "include/testing.hpp"

using namespace std;

extern "C" size_t test_out(unsigned char** out_ptr);
extern "C" void test_in(const unsigned char* data, size_t len);

int main(int argc, char *argv[]) {

    uint8_t* out_ptr1 = nullptr;
    size_t len1 = test_out(&out_ptr1);
    test_in(out_ptr1, len1);
    std::cout << "Just in and out works fine" << std::endl;

    uint8_t* out_ptr = nullptr;
    size_t len = test_out(&out_ptr);

    std::vector<uint8_t> serialized_result(out_ptr, out_ptr + len);
    auto res = testing::MyStruct::bincodeDeserialize(serialized_result);

    auto serializer = serde::BincodeSerializer();
    serde::Serializable<testing::MyStruct>::serialize(res, serializer);
    std::vector<uint8_t> bytes = std::move(serializer).bytes();

    test_in(bytes.data(), bytes.size());

    std::cout << "Everything is fine" << std::endl;

    return 0;
}

(I'm aware that this leaks memory and unwinds through the ffi boundary.)

Stack trace/error message

terminate called after throwing an instance of 'serde::deserialization_error'
  what():  Unknown variant index for enum

See here for an example project reproducing the error. Assuming a linux environment with g++ installed you only need to clone that repository and execute make to run all required steps to see this error message

Expected Behavior

I would expect that the c++ code executes without throwing an exception. That would imply that it prints the following messages:

Just in and out works fine
Everything is fine

System information

Please complete the following information: serde-reflection: 0.3.6 serde-generate: 0.25.0

rustc: rustc 1.67.0 (fc594f156 2023-01-24)

Additional context

Slightly changing the layout of the MyStruct type and/or the Value enum changes the generated error message. For example removing this line removes the c++ exception but causes this assert to fail with the following message:

thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `MyStruct { a: (Custom("bar"), String("abc")), o: Id }`,
 right: `MyStruct { a: (Custom("abc"), String("bar")), o: Id }`', src/lib.rs:35:9

Which implies that there is something wrong with how enum variants are handled inside of a tuple.

weiznich commented 1 year ago

@ma2bd Any ideas how to fix that?

ma2bd commented 1 year ago

Oh I just saw this. Let me have a look

ma2bd commented 1 year ago

Thanks @weiznich for the high quality bug report. I cloned your project and running make actually works fine on my Macbook, with g++ = "Apple clang version 13.1.6".

I'm a bit puzzled.

ma2bd commented 1 year ago

oops didn't mean to close the issue just yet

ma2bd commented 1 year ago

@weiznich Can you verify the bytes computed by your test_out (aka the Rust bincode::serialize)? This is what I have: 1 0 0 0 3 0 0 0 0 0 0 0 97 98 99 2 0 0 0 3 0 0 0 0 0 0 0 98 97 114 0 0 0 0 (and this looks correct) From your report, I understand that on your machine the C++ bincode deserializer is failing on this input?

ma2bd commented 1 year ago

FWIW using g++-12 from homebrew's gcc also works on my laptop. IMHO we need to know more about the configuration needed to reproduce the issue at this point

weiznich commented 1 year ago

This is what I have: 1 0 0 0 3 0 0 0 0 0 0 0 97 98 99 2 0 0 0 3 0 0 0 0 0 0 0 98 97 114 0 0 0 0 (and this looks correct)

That's also the byte output for me.

IMHO we need to know more about the configuration needed to reproduce the issue at this point

That's happening for me with the following compiler:

gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)

using fedora 37.

I cannot reproduce that locally using clang:

clang version 15.0.7 (Fedora 15.0.7-1.fc37)

Given these points: It sounds like there is someplace that relays on the evaluation order of function arguments. That's unspecified in c++ as far as I remember and gcc/clang implement in different ways.

EDIT: Backtrace:

#0  0x00007ffff7aafe5c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff7a5fa76 in raise () from /lib64/libc.so.6
#2  0x00007ffff7a497fc in abort () from /lib64/libc.so.6
#3  0x00007ffff7ca2b97 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
#4  0x00007ffff7cae48c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#5  0x00007ffff7cae4f7 in std::terminate() () from /lib64/libstdc++.so.6
#6  0x00007ffff7cae758 in __cxa_throw () from /lib64/libstdc++.so.6
#7  0x000000000040b228 in std::variant<testing::Key::Id, testing::Key::Custom> serde::Deserializable<std::variant<testing::Key::Id, testing::Key::Custom> >::deserialize<serde::BincodeDeserializer>(serde::BincodeDeserializer&) ()
#8  0x000000000040a3b9 in testing::Key serde::Deserializable<testing::Key>::deserialize<serde::BincodeDeserializer>(serde::BincodeDeserializer&) ()
#9  0x000000000040a6e4 in testing::MyStruct serde::Deserializable<testing::MyStruct>::deserialize<serde::BincodeDeserializer>(serde::BincodeDeserializer&) ()
#10 0x0000000000409aba in testing::MyStruct::bincodeDeserialize(std::vector<unsigned char, std::allocator<unsigned char> >)
    ()
#11 0x0000000000408fc5 in main ()
weiznich commented 1 year ago

Given the fact that it seems to switch key/value with the following patch to the test case:

--- a/src/lib.rs
+++ b/src/lib.rs
@@ -10,20 +10,20 @@ pub enum Key {
 #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
 pub enum Value {
     NotSet,
-    Integer(i32),
+//    Integer(i32),
     String(String),
 }

 #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
 pub struct MyStruct {
     pub a: (Key, Value),
-    pub o: Key,
+    //pub o: Key,
 }

 fn make_attribute() -> MyStruct {
     MyStruct {
         a: (Key::Custom("abc".into()), Value::String("bar".into())),
-        o: Key::Id,
+        //o: Key::Id,
     }
 }

results in

thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `MyStruct { a: (Custom("bar"), String("abc")) }`,
 right: `MyStruct { a: (Custom("abc"), String("bar")) }`', src/lib.rs:35:9

I guess that the issue might be here: https://github.com/zefchain/serde-reflection/blob/7facc0412d060a7cf20d1da17d75b014b1218fab/serde-generate/runtime/cpp/serde.hpp#L613 ?

weiznich commented 1 year ago

@ma2bd I've opened #32 to fix this.

ma2bd commented 1 year ago

@weiznich Thanks for the fix! I suspected an issue at this line but couldn't find the right explanation (because it's so counter-intuitive as a Rust engineer :)).

ma2bd commented 1 year ago

Fix was released in serde-generate v0.25.1