zefchain / serde-reflection

Rust libraries and tools to help with interoperability and testing of serialization formats based on Serde.
Apache License 2.0
137 stars 26 forks source link

Add config option to not cut tracing at recursion points #41

Open juntyr opened 5 months ago

juntyr commented 5 months ago

Summary

The tracer thus far makes conservative assumptions to ensure that even tracing infinitely-expanding containers will be traced with termination. Sometimes, however, the user knows that their data type is non-infinite and that these conservative assumptions may be broken.

This PR adds four config options, one per recursive tracing cut-off point, to disable the safeguards and trace a type exhaustively.

Test Plan

I'm happy to add tests where needed :)

ma2bd commented 5 months ago

Do you have an example where this is useful?

juntyr commented 5 months ago

Do you have an example where this is useful?

Yes - I’m using it to translate Rust types into Python types at ~runtime. In particular, I have large Rust library that exposes config through deep serde data structures. I’m also developing a Python wrapper for this library. Translating all types over manually is not scalable. While serialising and deserialising from/to duck-typed Python data is possible using the pythonizer crate, I also want to generate nice Python documentation.

Here is where serde-reflection comes in. My Python extension module written in Rust traces all config data structures exhaustively (which is needed since they contain private enums inside options inside publicly exposed structs), from which I generate Python types that are exported in the Python modules, which are then picked up during an auto doc step to generate native Python docs.

ma2bd commented 5 months ago

Do you have an example where this is useful?

Yes - I’m using it to translate Rust types into Python types at ~runtime. In particular, I have large Rust library that exposes config through deep serde data structures. I’m also developing a Python wrapper for this library. Translating all types over manually is not scalable. While serialising and deserialising from/to duck-typed Python data is possible using the pythonizer crate, I also want to generate nice Python documentation.

Here is where serde-reflection comes in. My Python extension module written in Rust traces all config data structures exhaustively (which is needed since they contain private enums inside options inside publicly exposed structs), from which I generate Python types that are exported in the Python modules, which are then picked up during an auto doc step to generate native Python docs.

Thanks for sharing. Actually, I meant a minimal sample of Rust code that shows why this feature is useful.

ma2bd commented 5 months ago

Mmm I guess your answer is the private enums.

juntyr commented 4 months ago

@ma2bd What are your current thoughts on this? In my opinion, it's a good small step forward that simply exposes more current functionality. Any substantive improvements to the actual tracing could be done in future PRs but should not block this minor change.

juntyr commented 4 months ago

gentle ping

juntyr commented 3 months ago

@ma2bd Do you have any thoughts?

ma2bd commented 3 months ago

@ma2bd Do you have any thoughts?

@juntyr Thanks for your patience, so here is a path forward:

ma2bd commented 3 months ago

On a different note, I don't think further improving the tracing algorithm is going to be easy. It will probably cause massive complications and limitations (like a thread-unsafe global state). Personally, what I'd love for the Rust community is a simpler serde crate specialized in binary formats (no JSON hacks) and where format tracing (for various purposes) is officially supported.

juntyr commented 3 months ago

On a different note, I don't think further improving the tracing algorithm is going to be easy. It will probably cause massive complications and limitations (like a thread-unsafe global state). Personally, what I'd love for the Rust community is a simpler serde crate specialized in binary formats (no JSON hacks) and where format tracing (for various purposes) is officially supported.

Yes, I absolutely agree with you on that. I’m a maintainer of ron and last year I added fuzzing support for serde attributes and 99% of its findings are about serde and its limitations for any data format that isn’t like JSON. I think a serde 2.0 could provide per-type shallow type information (since it’s the only library with this much ecosystem penetration to gain proc macro access to most type definitions), which a tracer could then combine into a deep layout.

But I think that may unfortunately be wishful thinking for a while and until then this crate provides the best solution I’ve come across (huge props for that to you)

juntyr commented 3 months ago

I’ll look at and implement your suggestions in the next few days :)