sugyan / atrium

Rust libraries for Bluesky's AT Protocol services.
MIT License
188 stars 20 forks source link

Deserializing embed messages from subscriptions #17

Closed sugyan closed 8 months ago

sugyan commented 1 year ago

ref: https://github.com/sugyan/atrium/pull/15#issuecomment-1556188237

timfpark commented 9 months ago

Let me know if there is an outline of what this work entails and if it is a suitable first time issue for newcomer. If so I would be happy to help!

monaqa commented 9 months ago

Facing the same problem. I suspect that the following issue of ciborium crate is related.

https://github.com/enarx/ciborium/issues/71

sugyan commented 9 months ago

I'm currently working on resolving this issue and expect to fix it by applying #96. I believe serde_ipld_dagcbor is more appropriate than ciborium for deserializing data in DAG-CBOR format, including cid, and I have submitted a pull request for one issue there https://github.com/ipld/serde_ipld_dagcbor/pull/21 I think that once these are resolved, we can deserialize all messages correctly. Please wait a bit until the new version comes out.

sugyan commented 9 months ago

I believe that with the #96 and #98 update, records containing embedded images can now be deserialized correctly and the error should no longer occur.

str4d commented 8 months ago

Tested using latest atrium-api (plus #121), and patching in ipld/serde_ipld_dagcbor@ffae32959bd154f73614dac43b7e4826bb106399 (which includes https://github.com/ipld/serde_ipld_dagcbor/pull/23), using the following code to deserialize a post containing an embedded image:

let post: app::bsky::feed::post::Record = serde_ipld_dagcbor::from_reader(&data[..])?;

When no additional atrium-api features are enabled, I get the following error:

Error: Msg("data did not match any variant of untagged enum BlobRef")

When I enable the dag-cbor feature of atrium-api, the post parses correctly.

And for completeness, if I enable the dag-cbor feature of atrium-api but don't patch serde_ipld_dagcbor, I get this error:

Error: Msg("Only bytes can be deserialized into a CID")
sugyan commented 8 months ago

@str4d Yes, that's right! The CidLink deserialization uses a slightly odd method because the schema is different for JSON format and for DAG-CBOR format: to accommodate both formats, the deserialize method converts once to Ipld to determine the format. But this seemed to have some overhead.

Benchmark for checking it: https://github.com/sugyan/atrium/blob/main/atrium-api/benches/cid-link.rs (Looks like we need libipld_core in dev-dependencies...)

So I implemented CidLink for DAG-CBOR as an additional feature, because I thought it is better to use only simple JSON objects if the user knows that the user will only use JSON format. Sorry for the lack of documentation.

...But now we should probably use types::string::Cid for the link string of cid_link_json as well, and then it would be useless to worry about the overhead anymore, so maybe we can make it the default implementation.

sugyan commented 8 months ago

This is resolved at #98