mre / hyperjson

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.
Apache License 2.0
505 stars 40 forks source link

Speed up boolean encoding/decoding #68

Open mre opened 5 years ago

mre commented 5 years ago

From our benchmarks we can see that we are consistently slower than everyone else when serializing/deserializing boolean values. We should fix that.

orjson is using an unsafe block to create a reference to a boolean: https://github.com/ijl/orjson/blob/03d55e99a953ce93cedc05f03e4b63b0bcbbcc7a/src/decode.rs#L81-L96

This avoids additional allocations. For comparison, this is our code at the moment:

https://github.com/mre/hyperjson/blob/ded13b4100638aa32fe19dc477f5cfe3e704893c/src/lib.rs#L475-L480

I wonder if we could achieve comparable performance without using unsafe. @konstin, any idea? Maybe there was a recent development in pyo3 that we could leverage here?

konstin commented 5 years ago

It's weird that this slower than what orjson does. That's the implementation of ToObject:

unsafe {
    PyObject::from_borrowed_ptr(
        py,
        if *self {
            ffi::Py_True()
        } else {
            ffi::Py_False()
        },
    )
}

And that's the implementation of from_borrowed_ptr:

debug_assert!(
    !ptr.is_null() && ffi::Py_REFCNT(ptr) > 0,
    format!("REFCNT: {:?} - {:?}", ptr, ffi::Py_REFCNT(ptr))
);
ffi::Py_INCREF(ptr);
PyObject(NonNull::new_unchecked(ptr))

So in theory this should compile down to the same as orjson. You could try using PyBool::new and see if that makes a difference, but otherwise I don't know enough about inspecting assembly to debug that.