rpcpool / yellowstone-faithful

Project Yellowstone Old Faithful is the project to make all of Solana's history accessible, content addressable and available via a variety of means.
https://old-faithful.net/
GNU Affero General Public License v3.0
54 stars 10 forks source link

Resplit car files based on subsets #107

Open anjor opened 2 weeks ago

anjor commented 2 weeks ago

Today we generate one large car file based on epoch. This is about ~600GB in size. We then split it using carlet. Carlet does splitting in a naive way where it takes one block (block here is in the IPLD sense of the word) at a time till it reaches the desired size. As a result, blocks belonging to the same dag and "connected" to eachother could be stored in separate CAR files and as a result in different filecoin deals.

This means when we try to retrieve data, only a retrieval protocol that fetches 1 block at a time i.e., bitswap, would work. This means all the separate split CAR files need to be stored with SPs who are serving data over bitswap.

However, we already have introduced the concept of a subset which collects a bunch of Blocks (in the solana sense of the word) together. Since we control which Blocks go in a subset, we could instead split the subsets in a way where each subset is <32GB and will fit in a filecoin sector. This way we have all the data for a subset in a single deal and now retrievable via bitswap as well as graphsync.

This would require the following work:

anjor commented 18 hours ago

https://github.com/rpcpool/yellowstone-faithful/pull/116 closes this