sugyan / atrium

Rust libraries for Bluesky's AT Protocol services.
MIT License
151 stars 15 forks source link

MST and repository parsing APIs #167

Open str4d opened 5 months ago

str4d commented 5 months ago

As part of #118, and to enable consuming #commit events from the firehose, we need APIs for parsing and interacting with MSTs and repository CARs.

str4d commented 5 months ago

I've written code for MST and repo parsing myself for a CAR viewer project, on top of atrium-api and libipld (but will be migrating it to ipld-core soon). I'd be happy to upstream it here if we can decide where it should go.

The current APIs I have are:

struct Repository<R: tokio::io::AsyncRead + tokio::io::AsyncSeek> { .. }

impl<R: AsyncRead + AsyncSeek + Unpin + Send> Repository<R> {
    async fn load(reader: R) -> Result<Self, _> { .. }

    fn did(&self) -> &Did { .. }
    fn keys<'a>(&'a mut self) -> impl Stream<Item = Result<String, _>> + 'a { .. }

    fn get_collection<'a, C: Collection + 'a>(
        &'a mut self,
    ) -> impl futures::Stream<Item = Result<(RecordKey, C::Record), _>> + 'a { .. }

    fn get_collection_reversed<'a, C: Collection + 'a>(
        &'a mut self,
    ) -> impl futures::Stream<Item = Result<(RecordKey, C::Record), _>> + 'a { .. }

    async fn get<C: Collection>(
        &mut self,
        rkey: &RecordKey,
    ) -> Result<Option<C::Record>, _> { .. }
}

mod mst {
    enum Located<E> {
        Entry(E),
        InSubtree(Cid),
    }

    struct Node { .. }

    impl Node {
        fn parse(bytes: &[u8]) -> Result<Option<Self>, _> { .. }
        fn get(&self, key: &[u8]) -> Option<Located<Cid>> { .. }

        fn entries_with_prefix<'a>(
            &'a self,
            prefix: &'a [u8],
        ) -> impl Iterator<Item = Located<(&[u8], Cid)>> + 'a { .. }

        fn reversed_entries_with_prefix<'a>(
            &'a self,
            prefix: &'a [u8],
        ) -> impl Iterator<Item = Located<(&[u8], Cid)>> + 'a { .. }
    }
}

I went with async APIs because I'm reading a CAR file from disk. For firehose subscribers maybe sync APIs would be fine, but given that the crates in this repo already have async APIs, I figure this works fine as a starting point.

sugyan commented 5 months ago

Thanks for the suggestion! I hadn't considered implementing something about that yet, but if you would like to add it, I’m very welcome to merge them.

Would it be better to add it as a new package, like atrium-repo (named from @atporoto/repo in reference to the original TypeScript implementation)? Also, as you may have noticed, we are trying to add some new libraries in atrium-libs in #166, but this is still a draft. I am adding implementations little by little now and may eventually split each into separate packages.

str4d commented 5 months ago

Sure, atrium-repo sounds good. I'll open a PR with the initial implementation.