Spade Polling name format and timeframe

We previously aligned that for Spade to Polling spade-proxy on aggregates to pass to SPs, Spade would poll from a S3 bucket where we would write the details for an aggregate. This polling will happen on an agreed timeframe (TBD - @ribasushi is there any recommendation from Spade or based on our previous usage of dagcargo?).

During the given timeframe for polling we can have more than one aggregates ready to go. Therefore, we need to make sure that all of them are ready to be polled. S3 client sadly does not support append to a already existing file, which would make our lives much easier. We are currently writing temporary files keyed as YYYY-MM-DD HH:MM:00 commPcid and have a PR with a cron job that kicks in and merges all these files on a single file keyed as YYYY-MM-DD HH:MM:00 that can be polled. This solution has some potential issues:

If we kick the cron before the timeframe is over another aggregate can be in the process of being written.
Spade would need to poll with some delay so that thing is already there

Alternatively, we could allow Spade to perform a Bucket list with prefix YYYY-MM-DD HH:MM:00 once time is ready and then perform a get in each key. This makes Spade job a tiny bit more complex on read, but is less susceptible to weirdness with times. This could also happen via an API call by Spade, where spade-proxy exposes an endpoint and does the merge of available aggregates.

For the content format, we are looking at a JSON file with format and I will create a separate issue to discuss that

Thoughts @ribasushi @alanshaw @xinaxu

We agreed with @anjor and @ribasushi that we will be writing in format YYYY-MM-DD HH:MM:00 commpCid into the bucket and we will give Spade's Deal Engine access to the bucket for List and Get. Therefore, Spade can perform lists to the bucket by prefix and get individual offers pending contract.

For the actual content of the offer to get a contract, it is required aggregate CID (and its size if not encoded) and a label we want in chain (@anjor to give more details). It is also recommended that we give segment count, even though not required. Therefore, we can just give all the pieces that are part of the aggregate to also have them recorded.

Based on async discussion https://www.notion.so/2023-07-26-Spade-integration-sync-00e382fee9b24586a4e35e5184ddc9a8

Spade's deal engine will assemble payload below (where label is optional and client can be obtained by spade through out of band information):

"Proposal": {
    👉  "PieceCID": "baga6ea4seaqkbd3kf77fmrpznqewwwojsi3oevv7f3kxs5m5hu7w5rwuubrrsiq",
    👉  "PieceSize": 34359738368,
          "VerifiedDeal": true,
    ❓👉  "Client": "f1klrehuwhzfw6boiiywdi7pikztsob4kkieivcwq",
         "Provider": "f02101252",
    ❓👉  "Label": "baga6ea4seaqkbd3kf77fmrpznqewwwojsi3oevv7f3kxs5m5hu7w5rwuubrrsiq",
          "StartEpoch": 3082943,
          "endEpoch": 4606463,
          "StoragePricePerEpoch": "0",
          "ProviderCollateral": "10112964864721888",
          "ClientCollateral": "0"
},

Given the PieceCID is static for all replicas, it would be good to get all its constituents together with the initial “posting”. Otherwise you will have to duplicate all the part-cids when you submit a list of token-guarded URIs.

Taking into account the above, we should provide offers to spade with:

pieceCid - CID of the aggregate
~~pieceSize~~ - size if not encoded in CID
segments - segments of the aggregate (ordered array) with pieceCid ~and pieceSize~ entries
client - Signer wallet (f0) address (for web3.storage or nft.storage)
(optional) label - can be used to encode any metadata that is meaningful (up to 256 size)

Format proposed for offer

{
  "client": "f3...",
  "label": "encoded-string",
  "pieceCid": "baga6ea4seaqcq4xx7rqx2lsrm6iky7qqk5jh7pbaj5bgdu22afhp4fodvccb6bq",
  "segments": [
    "baga6ea4seaqhisghxrl4yntcsxtoc6rany2fjtbxgcnhhztkh5myy2mbs4nk2ki",
    "baga6ea4seaqjeaetn3qrpxpnjaiz4iownifrz5grodxh2nzns4udumql7yikqba",
    "baga6ea4seaqo3fv7v6azpzeufymjgpkku4zy2y6ymwc2nesfqgb23cnk4kdtgcq",
    "baga6ea4seaqhufymnk6yhtxu4b7hu6zzlu6k3mi256zmetxdnubt42vcetoywja",
    "baga6ea4seaqbk3wfm645ua42tjrtjbk6zguldd5gbqeirbwp4sgipgxr3gypgmi",
    "baga6ea4seaqeyibfspj2ablhcypfilpg2lnnen7uvuqukfyfpkj6yjnmz45o4ay",
    "baga6ea4seaqoibpmnayfhb76nutn3fjtyhkorkgmvygwt22pizh73htj7vxc6fi"
  ]
}

An iteration based on the above and after talking with Spade team.

Agreed json file format:

{
  "orderId": 1690443634246,
  "tenantId": "...to...be...received...by...spade",
  "label": "encoded-string",
  "aggregate": "baga6ea4seaqcq4xx7rqx2lsrm6iky7qqk5jh7pbaj5bgdu22afhp4fodvccb6bq",
  "pieces": [
    "baga6ea4seaqhisghxrl4yntcsxtoc6rany2fjtbxgcnhhztkh5myy2mbs4nk2ki",
    "baga6ea4seaqjeaetn3qrpxpnjaiz4iownifrz5grodxh2nzns4udumql7yikqba",
    "baga6ea4seaqo3fv7v6azpzeufymjgpkku4zy2y6ymwc2nesfqgb23cnk4kdtgcq",
    "baga6ea4seaqhufymnk6yhtxu4b7hu6zzlu6k3mi256zmetxdnubt42vcetoywja",
    "baga6ea4seaqbk3wfm645ua42tjrtjbk6zguldd5gbqeirbwp4sgipgxr3gypgmi",
    "baga6ea4seaqeyibfspj2ablhcypfilpg2lnnen7uvuqukfyfpkj6yjnmz45o4ay",
    "baga6ea4seaqoibpmnayfhb76nutn3fjtyhkorkgmvygwt22pizh73htj7vxc6fi"
  ]
}

Note that:

segments order is critical and must be kept as the same as piece was computed
- while segments are not required, this enable us to not send huge JSON files to all the SPs we will be asking to store files later on
using tenantId instead of client allows us to leverage spade feature where a tenant can be configured with multiple wallets instead of a single one that could not have enough funds
orderId enables us to prioritize aggregates. Being it the number of ms since epoch, also means offers that fail and are retried will have "priority". In addition, we can add lower numbers to prioritize certain actors in the future

We also agreed that we will be writing files with name using the real timestamp rather than aligning with frequency read times. Prefix will be used to read through it

web3-storage / dealer

Spade Polling name format and timeframe #7