web3-storage / dealer

Other
2 stars 0 forks source link

Spade Polling name format and timeframe #7

Open vasco-santos opened 1 year ago

vasco-santos commented 1 year ago

We previously aligned that for Spade to Polling spade-proxy on aggregates to pass to SPs, Spade would poll from a S3 bucket where we would write the details for an aggregate. This polling will happen on an agreed timeframe (TBD - @ribasushi is there any recommendation from Spade or based on our previous usage of dagcargo?).

During the given timeframe for polling we can have more than one aggregates ready to go. Therefore, we need to make sure that all of them are ready to be polled. S3 client sadly does not support append to a already existing file, which would make our lives much easier. We are currently writing temporary files keyed as YYYY-MM-DD HH:MM:00 commPcid and have a PR with a cron job that kicks in and merges all these files on a single file keyed as YYYY-MM-DD HH:MM:00 that can be polled. This solution has some potential issues:

Alternatively, we could allow Spade to perform a Bucket list with prefix YYYY-MM-DD HH:MM:00 once time is ready and then perform a get in each key. This makes Spade job a tiny bit more complex on read, but is less susceptible to weirdness with times. This could also happen via an API call by Spade, where spade-proxy exposes an endpoint and does the merge of available aggregates.

For the content format, we are looking at a JSON file with format and I will create a separate issue to discuss that

Thoughts @ribasushi @alanshaw @xinaxu

vasco-santos commented 1 year ago

We agreed with @anjor and @ribasushi that we will be writing in format YYYY-MM-DD HH:MM:00 commpCid into the bucket and we will give Spade's Deal Engine access to the bucket for List and Get. Therefore, Spade can perform lists to the bucket by prefix and get individual offers pending contract.

For the actual content of the offer to get a contract, it is required aggregate CID (and its size if not encoded) and a label we want in chain (@anjor to give more details). It is also recommended that we give segment count, even though not required. Therefore, we can just give all the pieces that are part of the aggregate to also have them recorded.

vasco-santos commented 1 year ago

Based on async discussion https://www.notion.so/2023-07-26-Spade-integration-sync-00e382fee9b24586a4e35e5184ddc9a8

Spade's deal engine will assemble payload below (where label is optional and client can be obtained by spade through out of band information):

"Proposal": {
    πŸ‘‰  "PieceCID": "baga6ea4seaqkbd3kf77fmrpznqewwwojsi3oevv7f3kxs5m5hu7w5rwuubrrsiq",
    πŸ‘‰  "PieceSize": 34359738368,
          "VerifiedDeal": true,
    β“πŸ‘‰  "Client": "f1klrehuwhzfw6boiiywdi7pikztsob4kkieivcwq",
         "Provider": "f02101252",
    β“πŸ‘‰  "Label": "baga6ea4seaqkbd3kf77fmrpznqewwwojsi3oevv7f3kxs5m5hu7w5rwuubrrsiq",
          "StartEpoch": 3082943,
          "endEpoch": 4606463,
          "StoragePricePerEpoch": "0",
          "ProviderCollateral": "10112964864721888",
          "ClientCollateral": "0"
},

Given the PieceCID is static for all replicas, it would be good to get all its constituents together with the initial β€œposting”. Otherwise you will have to duplicate all the part-cids when you submit a list of token-guarded URIs.

Taking into account the above, we should provide offers to spade with:

Format proposed for offer

{
  "client": "f3...",
  "label": "encoded-string",
  "pieceCid": "baga6ea4seaqcq4xx7rqx2lsrm6iky7qqk5jh7pbaj5bgdu22afhp4fodvccb6bq",
  "segments": [
    "baga6ea4seaqhisghxrl4yntcsxtoc6rany2fjtbxgcnhhztkh5myy2mbs4nk2ki",
    "baga6ea4seaqjeaetn3qrpxpnjaiz4iownifrz5grodxh2nzns4udumql7yikqba",
    "baga6ea4seaqo3fv7v6azpzeufymjgpkku4zy2y6ymwc2nesfqgb23cnk4kdtgcq",
    "baga6ea4seaqhufymnk6yhtxu4b7hu6zzlu6k3mi256zmetxdnubt42vcetoywja",
    "baga6ea4seaqbk3wfm645ua42tjrtjbk6zguldd5gbqeirbwp4sgipgxr3gypgmi",
    "baga6ea4seaqeyibfspj2ablhcypfilpg2lnnen7uvuqukfyfpkj6yjnmz45o4ay",
    "baga6ea4seaqoibpmnayfhb76nutn3fjtyhkorkgmvygwt22pizh73htj7vxc6fi"
  ]
}
vasco-santos commented 1 year ago

An iteration based on the above and after talking with Spade team.

Agreed json file format:

{
  "orderId": 1690443634246,
  "tenantId": "...to...be...received...by...spade",
  "label": "encoded-string",
  "aggregate": "baga6ea4seaqcq4xx7rqx2lsrm6iky7qqk5jh7pbaj5bgdu22afhp4fodvccb6bq",
  "pieces": [
    "baga6ea4seaqhisghxrl4yntcsxtoc6rany2fjtbxgcnhhztkh5myy2mbs4nk2ki",
    "baga6ea4seaqjeaetn3qrpxpnjaiz4iownifrz5grodxh2nzns4udumql7yikqba",
    "baga6ea4seaqo3fv7v6azpzeufymjgpkku4zy2y6ymwc2nesfqgb23cnk4kdtgcq",
    "baga6ea4seaqhufymnk6yhtxu4b7hu6zzlu6k3mi256zmetxdnubt42vcetoywja",
    "baga6ea4seaqbk3wfm645ua42tjrtjbk6zguldd5gbqeirbwp4sgipgxr3gypgmi",
    "baga6ea4seaqeyibfspj2ablhcypfilpg2lnnen7uvuqukfyfpkj6yjnmz45o4ay",
    "baga6ea4seaqoibpmnayfhb76nutn3fjtyhkorkgmvygwt22pizh73htj7vxc6fi"
  ]
}

Note that:


We also agreed that we will be writing files with name using the real timestamp rather than aligning with frequency read times. Prefix will be used to read through it