protocol / launchpad

The curriculum for the Launchpad program
Apache License 2.0
37 stars 85 forks source link

Add a section about network indexer to IPFS or Filecoin #428

Open walkerlj0 opened 1 year ago

walkerlj0 commented 1 year ago

Problem Description

Purpose

Describe what need this projectr is filling, or what problem is sovling, and who the intended audience or end user is

Proposed Solution

What features, user stories, or product/ content you would like as an output

Where Can we find out more about this topic? Torfinn Olson & Ivan Schansy https://filecoin.io/blog/posts/introducing-the-network-indexer/ https://github.com/ipni/specs/blob/main/IPNI.md https://www.youtube.com/watch?v=sunA7JO4rHQ&list=PLuhRWgmPaHtSF3oIY3TzrM-Nq5IU_RTXb&index=11 https://www.youtube.com/watch?v=g7iwPIpeSIo&list=PLuhRWgmPaHtSF3oIY3TzrM-Nq5IU_RTXb&index=8 https://docs.cid.contact/filecoin-network-indexer/overview https://github.com/ipni/storetheindex

Additional context Network indexer uses both Graphsync & bitswap, and make the interoperability of IPFS and Filecoin posisble

Milestones (Optional)

1) Milstone Name Text

2) Milstone Name Text

Acceptance Criteria

Describe the out put and criteria for this output for considering this task completed

walkerlj0 commented 1 year ago

Info dump from launchpad-coloweek-v7

@Walker Ford had a lot of REALLY good questions about some of the more functional aspects of the :network-indexer: network indexer during my presentation that took me a little time to put together responses for. I figured everyone should benefit from the answers so I've decided just to share them here instead of leaving them lodged within the presentation.

Q: Why does the Indexer watch the SP announcement chain, and not just watch the blockchain directly. Confused about the benefits of the announcement/advertisement chain. A: Capturing advertisements in situ as data is being added to the IPFS blockchain as a result of a Filecoin deal is simply the least overhead(most efficient approach) for capturing this information as opposed to possibly attempting to Query it directly from crawling the entire chain. Advertisement chain chunks give us a very fast way to find multihashes and return them. Crawling the chain would be very slow and compute intensive in comparison. Consider the difference in number of hops.

Q: How big is the Indexer database? A: See Disk space utilization. ~3TiB, but was actually once dramatically larger this is the result of recent improvements that have had dramatic results in optimizing data storage practices.

Q: Plans to shard the Indexer database and distribute across more nodes? A: The plan to shard the indexer database across more nodes Indexer scaling plan.

Q: What kinds of information does the Indexer have that the DHT does not have? A: The simple answer is; 'metadata'. Who has what, and what protocols are retrievable. The DHT stores IPNS records, which the IPNI does not, although there is a spec for making that happen sometime in the future. Naam naming system powered by IPNI.

The technical answer: Recommend reading - Ingestion design doc (Provider,ContextID, ProviderID,Metadata,Signature,Entries) Provider: The peer.ID of the libp2p host providing the content. Addresses: Multiaddrs to provide to clients in order to connect to the provider. Entries: Link to a data structure that contains the advertised multihashes. ContextID: Identifier used to subsequently update or delete an advertisement. ProviderID: Metadata: Additional data returned in client query responses for ay of the CIDs in this advertisement. Expected to start with varint indicating the remaining format of metadata. Recommended to keep it below 100 bytes. Reference provider currently supports Bitswap and Filecoin(graphsync) protocols or HTTP, defined in the library. Signature: Signed by provider private key. Entries: can be an interlinked chain of entrychunk nodes, or an IPLD HAMT ADL where the keys in the map represent the multihashes and the values are set to true.

Q: Where does/did the IPFS Hydras fit in with the indexer? A: :snake: The hydra nodes acted s a lookup for the DHT which previously acted as a sync with the Indexer as a post lookup activity. Hydras are presently performing a bridging function which is the path IPFS gateways would have to leverage in order to pull data from the network indexer. IPFS gateways now have the option of querying the indexer directly via what we're calling HTTP delegated routing. In the future we will have Ambient discovery of content routers which will greatly improve on the speed and efficiency of this process while reducing the IPFS network sole reliance on the Kademlia DHT as a source of content lookups. Read more about ambient content routing:

Or alternatively ANY of you are welcome to join the Content Routing Workgroup I formed and participate in these discussions we currently host biweekly meetings between the IPFS stewards team, Pro-lab, and the Network Indexer team as well as interested stakeholders. Read more on the Content routing workgroup page @Lindsay Walker I'm still promising you some IPNI launchpad content :wink: and we can use this as a good FAQ start for that I think.

Additionally I'd recommend reading more here: https://docs.cid.contact/filecoin-network-indexer/overview