Open smallhive opened 2 weeks ago
The problem is not the hash, really. The problem is split chain itself and v1/v2 are the same here, pre-#957 code didn't support this as well. Chaining is this previous
reference we have, it's designed for streams and it's really good for this use case, but S3 multipart is not about streaming one part after another, it treats multipart upload as a number of independent slots and, most importantly, real applications do use this property.
Potential solutions:
CompleteMultipart
Temporary objects require additional logic on S3 side and can leave garbage that is harder to trace (additional attributes?). They can be optional (we can try pushing the next split chunk if possible and resort to additional objects if the part is out of sequence). And they will seriously affect multipart completion, it will require quite some time to reslice everything (hashing alone would be much easier, but that's not the problem we have).
Supporting "slots" can be some additional "part number" attribute that is used instead of "previous". It completely breaks backward walking assembly logic and makes link objects more important, but it's still a possibility and we still can find all related objects this way. It can also simplify part reuploading. At the same time it's a protocol change. Can this be useful for standalone NeoFS? Not sure.
@carpawell?
From the NeoFS side, I see some questions, and the main one is if we can solve them successfully, why have we needed this backward-chained logic from the beginning for so long if we can accept a simpler scheme (but based on some agreements that should be taken as truth)?
Don't mind considering protocol changes but for now to me it is more like trying to play against NeoFS and figuring out some kludges about it.
Chained objects are more robust and they're very good for streams of data. Typical NeoFS slicing pattern is exactly that, you know previous object hash, you know all the hashes, you can build these links and indexes effectively and you can always follow the chain exactly.
Slot-alike structure is more fragile, it's not simpler, without an index object it requires searches to find other parts. Also, regarding its use for S3 one thing to keep in mind is that probably we can't ensure 1:1 slot mapping between NeoFS and S3, since parts there are 5 MB to 5 GB and 5GB is a big (split) object in NeoFS. Split hierarchies is something we've long tried to avoid and I'd still try to so.
Unfortunately, looks like this limits us to some S3-specific scheme with regular objects that are then reassembled upon upload completion. Which totally destroys the optimization we have now (almost free multipart upload completion). I'm all ears for other ideas.
Multipart uploads don't work in general case
Current Behavior
AWS SDK uploads parts for multipart in 5 parallel threads. The gate expects parts subsequently one by one
Expected Behavior
All parts should be uploaded in any order
Possible Solution
Collect the final object hash in a different way
Steps to Reproduce
OperationAborted: 409
errorContext
Related to #1016
Your Environment