taskcluster / taskcluster-rfcs

Taskcluster team planning
Mozilla Public License 2.0
11 stars 20 forks source link

Object service #155

Closed owlishDeveloper closed 4 years ago

owlishDeveloper commented 4 years ago

Note: I didn't entirely understand the mechanics/intentions of the API in the original RFC, so I made my own (maybe re-invented the wheel, really).

Feel free to ask questions, critique, etc.

owlishDeveloper commented 4 years ago

I can't see the swagger:

image

My bad - it was private. Can you check again? It should be public now

Overall this looks good! I like that it's high-level, but provides a framework for developing the details. I'd like to know a little more about what "efficient" means: saving bandwidth costs? download times? costs for additional storage?

"efficient" means cost-efficient and also time-efficient. It would be nice to find the cheapest ways of doing things (in terms of money - any money, storage or bandwidth) and in terms of how long things take, or find some reasonable compromise.

djmitche commented 4 years ago

I can see it now -- thanks!

owlishDeveloper commented 4 years ago
* the swagger definition doesn't have the `delete` operation listed

The swagger definition is no longer relevant, I removed it from the document. I added DELETE as I was copying the API definition from swagger..... I realized that since saving cost is one of the top priorities now, implementing this operation makes more sense than it did when the original discussions took place. So I added it.

how does the object service handle object signatures when objects are uploaded directly to cloud providers?

I imagine object signatures would be created once the object has been uploaded (can be done as a step 3 in the creation process). The three cloud providers I looked at (AWS, GCP, Azure) do offer a server-side encryption with customer-managed keys (as well as the cloud provider managed keys). At the moment, it makes sense to me to utilize that functionality. Does this answer your question?

djmitche commented 4 years ago

Good point about DELETE. Will it be the queue's responsibility to remember about artifacts and call the DELETE method on the corresponding object, or will the Object Service have a way to support automatically expiring objects? At the moment, the queue uses the first approach (and it is careful to write to its artifacts table before creating the object, so that if something crashes, it doesn't leave an orphan object).

sciurus commented 4 years ago

I don't see any mention of CDNs in the RFC, but I assume the object service will continue giving out CDN urls instead of direct S3 (or equivalent) URLs in the situations where this is done today.

owlishDeveloper commented 4 years ago

Will it be the queue's responsibility to remember about artifacts and call the DELETE method on the corresponding object, or will the Object Service have a way to support automatically expiring objects?

I wonder of we can have both? Support expiring objects through the bucket life-cycle settings (I'm not sure about this functionality in GCP and Azure, would need to look that up), and have an endpoint in object service that can either be called by a person (curl etc) or be put into a script like a cron job. The state of this will be saved in database, of course

owlishDeveloper commented 4 years ago

I don't see any mention of CDNs in the RFC, but I assume the object service will continue giving out CDN urls instead of direct S3 (or equivalent) URLs in the situations where this is done today.

My bad! I'll add a mention, that definitely should be in there, thank you. And to answer your question - yes, I think so.

Will it be the queue's responsibility to remember about artifacts and call the DELETE method on the corresponding object, or will the Object Service have a way to support automatically expiring objects?

an endpoint in object service that can either be called by a person (curl etc) or be put into a script like a cron job

...or be automatic based on the expiry setting just as queue does. I think if we have an API endpoint, we will have all of these options available to us to use

owlishDeveloper commented 4 years ago

@djmitche but we don't want to have an endpoint, but just some unexposed functionality within object service to periodically delete objects - perfectly fine with me as well. I can add this to the RFC and remove the DELETE endpoint then

djmitche commented 4 years ago

I think what you suggested makes sense: be flexible in how artifacts are deleted, allowing both an automatic expiry and an explicit delete. That mirrors what we do with other things like secrets. If that expiry can happen "automatically" by some configuration of the cloud provider, all the better!