rpcpool / yellowstone-faithful

Project Yellowstone Old Faithful is the project to make all of Solana's history accessible, content addressable and available via a variety of means.
https://old-faithful.net/
GNU Affero General Public License v3.0
79 stars 16 forks source link

Config file format changes (what do do about deals.csv, metadata.yml) #145

Open linuskendall opened 2 months ago

linuskendall commented 2 months ago

problem statement

storing and utilising deals.csv and metadata.yaml is a bit of a hack, and we would like to remove this.

current state

currently we use deals.csv to do a lookup from Piece CID to SP. I.e. given a specific piece, which SP has it stored.

currently we use metadata.yaml to map a byte offset in the "full car file" to: a) a specific piece CID,i.e. given offset X in the full car file, which piece does this offset fall within b) an offset within this piece, i.e. given offset X in the full car file, what's the offset within that piece that this offset corresponds to

Pieces has some header data and padding, and that's why b is needed.

suggested improvement

If we implement #122 then we can make this a lot simpler. instead of deals.csv and meatadata.yaml we could just use the following config:

pieces:
   - subset: <subset_cid>
     pcid: <piece_cid>
     sps:
        - <sp_id1>
        - <sp_id2>
        - ...

so when trying to fetch a CID, we would just look up the CID in the cid-to-subset. then using the piece config above, we would find the correct pcid and the sp that has it stored. then we could simply

we can have a tool for now that reads deals.csv and metadata.yaml and produce this config file.

this config file is the permanent version if it.

future options

we could also infer the subset from a specific piece. if we just configured faithful wiht a list of pieces and sps, faithful could read the root CID from the piece and then know which subset this is.

impacts

  1. we should create a new config file format version
  2. this config file format version woudl support the index in #122
  3. this config file format version woudl not support deals.csv and metadata.yaml

benefits

in theory we can restore deals.csv and metadata.yaml using this approach by crawsling our address and then looking at the deals made.

for each deal we can lookup the piece CID from chain (as far as I understand) and from piece CID + sp ID we could then read the root block in the piece to get the subset. From this we would fully be able to use existing indexes and offsets etc.

since the offsets are unique the piece this makes it quite easy to reuse indexes.

linuskendall commented 2 weeks ago

So current suggestion is:

data:
  car:
    from_pieces:
      - ipfs://<subset cid>
      - sp123213:<piece cid>
      - file://tank/faithful/baga123123122.car
      - http://abc.com/faithful/baga123213.car

We should be able to turn sp123123: into https:///ipfs/ and then use it like a normal url For ipfs we should be able to use ipfs retrievals to just fetch the subset block. For file we can use just file system operations already in PR #166 . For http we can use the method outlined in #169.

linuskendall commented 2 weeks ago

Related to #120

linuskendall commented 1 week ago

What's left on this issue: