Closed paulsnar closed 2 years ago
Splitting out GCS infra into its own item.
Aside from the GCS infra, there should be a well-defined scheme mapping upload intents to GCS URLs. I propose the following: take the file's hash in hex form, and index it into the bucket directly: so a file with hash (hex) 123456789abcdef0
(probably invalid multihash, bite me) would map onto ${GCS_TARGET_PREFIX}/123456789abcdef0
(e.g., gs://pifs-bucket/subfolder/123456789abcdef0
.)
Okay, the scaffolding is all set up as of 08e6db7, picking up the initial part of implementing the uploads.begin
method should be pretty easy now.
Done as per f933183.
Uploading, as per API spec, is split into multiple operations.
Beginning an upload registers the upload intent in the database and allocates a respective URL on the GCS side; the URL is returned with a signature that allows the client to access it for upload without having Gcloud credentials. The begin request is failed with an error if the given file size exceeds a configurable limit (therefore configuration for the limit should be added, not in the database but in .env probably?)
The intent stores the last progress report (starts at 0.) During upload, requests to register upload progress should update this; requests to return it should, well, return the last value stored. (This is small enough to be lumped in here until further notice.)
Finishing an upload and cancelling warrant separate items given that the actions to be undertaken in those cases are numerous in amount (like Libya's exports.)
Blocks the rest of indexing workflow.