Find links without creating an entire IPLD AST

rklaehn commented 1 year ago

Currently the way links are generated from a piece of content is via deserializing into an IPLD AST. This is very inefficient compared to just scanning the data for links.

There is a mechanism to extract links without building a temporary AST in libipld. We (actyx) built it for ipfs-embed, but I think efficient link extraction only exists for CBOR, and we care mostly about protobuf / unixfs for now.

In any case, this should be fixed.


/// Extract links from the given content.
pub fn parse_links(cid: &Cid, bytes: &[u8]) -> Result<Vec<Cid>> {
    let codec = Codec::try_from(cid.codec()).context("unknown codec")?;
    let codec = match codec {
        Codec::DagPb => IpldCodec::DagPb,
        Codec::DagCbor => IpldCodec::DagCbor,
        Codec::DagJson => IpldCodec::DagJson,
        Codec::Raw => IpldCodec::Raw,
        _ => bail!("unsupported codec {:?}", codec),
    };

    /// AVOID THIS, just scan the data for links.
    let decoded: Ipld = Ipld::decode(codec, &mut std::io::Cursor::new(bytes))?;
    let mut links = Vec::new();
    decoded.references(&mut links);

    Ok(links)
}

This would require some additional code in libipld. See https://github.com/ipld/libipld/issues/154

Solved for

[x] dag-cbor
[x] dag-pb
[ ] dag-json

RangerMauve commented 1 year ago

Lazy decoding of IPLD data would be nice to have in the Rust ecosystem in general. I'm very interested to see how this goes.

rklaehn commented 1 year ago

@RangerMauve I am afraid we will just use the built in functionality for the link scraping special case in rust libp2p. What you are after would be some way to traverse any flavour of IPLD (dag-pb, dag-cbor, dag-json, ...) without building an AST, with a generic streaming parser, right? Not sure if something like this is on the roadmap for rust-libipld.

n0-computer / beetle

Find links without creating an entire IPLD AST #162