Closed Gozala closed 7 months ago
I'm kinda down with this but I'm not convinced yet that the effort is justified and I have implementation questions!
I can see that this solves:
X-Root-Cid
header on each request to allow use of the "simple" carbites strategy which requires no decoders?Implementation questions:
sha256
a directory? Do we get people to tar stuff and then extract it?Devils advocate:
* Yes, it's a pain, but like I said before if you're encoding data with a non-default encoder then you likely have the decoder to hand so it's probably not blocking.
That is true only if all of your application stack is in JS (that uses uses specifically new IPLD stack).
Right now, we cannot even store data in IPFS that is not pb, cbor or raw. IPFS will error importing it because it decodes the blocks. We should lobby for this to be fixed.
We should absolutely fix that. Does not that affect carbites just the same though ?
Could we instead have the client set an X-Root-Cid header on each request to allow use of the "simple" carbites strategy which requires no decoders?
We could. I do not think that would be better though, not for the users. Beauty of proposed API is that it allows chunked uploading of bytes regardless of what they represent. Having to provide CIDs would imply:
Avoiding CID here avoids all that.
One thing I like about the current setup is that it encourages people to move to content addressing. You can use simple fetch for small data initially and as your data requirements and understanding of IPFS grows you upgrade to uploading DAGs.
Using sha256 is content addressing, sure it's not all they way IPFS and DAGs but on the flip side it is a lot more accessible. Better yet it does compose with Dag encoding, our client API can just take car writer and do the rest (hash it and chunk upload it).
We should be doing all we can to encourage transferring DAGs and not plain files. This is the opposite of that.
I am not disagreeing, yet I would like to ask why should we ?
This is not the opposite of encouraging transferring DAGs (it supports DAGs just well) it is rather meeting users where they are. If they can give us a dag that is great, but if they don't have one we can still take their file and turn it to a dag on the server, without making our cloudfare limits their problem.
Where do the chunks get stored before the final flush?
I don't know, it's an implementation detail. Could be S3 or maybe in a postgres binary file store.
- I'm not clear on how we re-assemble the chunks and import into IPFS on flush given the Cloudflare workers environment and a 30s execution time.
This needs exploring as well, but I imagine we could pipe chunks into /ipfs/add
more or less (for cars it would be /dag/import).
If 30s is a problem we can consider alternative strategy of doing chunking on write (which would have to vary between car and other files) so on flush we have to just assemble.
- User has to wait for transfer from intermediate store to IPFS on flush - their upload time is doubled no?
Depends 🤷‍♂️ ? If we have S3 backup maybe they don't need to wait for anything we can just give back CID and deal with actual pinning in async.
This might be also good to prefer uploading dags when possible, but when that is not an option it's better than inability.
- How does this work with a directory of files? Can you
sha256
a directory? Do we get people to tar stuff and then extract it?
I have to admit I have not considered directories, maybe it should just be out of scope. If we really want to support directories here as well we could ask providing sha256 of form-data encoding of directory, but I find it less compelling because it doesn't really meet users where they are in this case.
it’s definitely better/easier for us when they just upload CAR files, assuming they are using typical codecs. we wouldn’t want to change anything we’ve already done, but i could see us adding this feature in 2022.
This API endpoint supposed to be agnostic of the mime type, in other words you could upload car formatted dags or an arbitrary files. Behavior will be same as with POST /upload.
i would want to have a separate endpoint, or very explicit querystring params, for regular files and CAR files:
@Gozala do we want to keep this open, or merge the discussion into #980 and #837?
We have been discussing .storage API V2 and specifically a need for some sort of session identifiers to do multiple uploads. Which is probably what will supersede this. This is probably a good place to discuss what it may look like here. I'll do than in the following comment
So revived idea to allow chunked uploads is following:
PATCH /
.Payload for that endpoint could would be some (yet to be determined) encoding of following structure:
interface Transaction {
data: CarFile // Car file containing set of blocks
instructions: Instruction[]
}
type Instruction = {
name: PublicKey
value: CID
seqno?: number
ttl?: number
}
That way client could:
love it.
so all the state is managed by the client updating an IPNS key, but we have a single endpoint for users to transactionally upload the data and update the state of the IPNS key
would we be able to just put the entire signed IPNS record in the UCAN they send for the upload?
We're closing this issue. If you still need help please open a new issue.
I wold like to propose chunked upload feature to workaround upload size limitation. Idea is to provide a new API endpoint
Supplied with Content-Range headers to provide pwrite like functionality. Widely available
sha256
could can be used to identify upload.After all chunks are written user send request with 0-0 range to flush, which would basically perform equivalent of
POST /upload
but use written bytes for body.This API endpoint supposed to be agnostic of the mime type, in other words you could upload car formatted dags or an arbitrary files. Behavior will be same as with
POST /upload
.Sentiments
PUT /${cid}
?sha256(content)
.