Closed robxu9 closed 2 years ago
While I think this is a great idea, I wonder if data retrieval charges would make this expensive enough to not worth it.
We use gocloud + gcs for remote execution and I don't think it's that expensive. @peterebden would know more. The cache is multiplexed so it should still be hitting the directory cache first.
We could probably re-write the http and dircache to follow its interfaces and then just have a single cache url config option and as long as that has a registered implementation, it should just work (tm).
@peterebden Thoughts on this? Could consolidate a lot of config.
@robxu9 We have some concerns around adding gocloud to the main please binary. As you said it's quite a large dependency and opens to door to adding many providers for different back-ends. Instead we'd prefer to add gocloud to //tools/http-cache. Please can be configured to talk to a normal HTTP cache, and this can act as a "gateway" to do OAuth, and s3/gcp stuff.
I'm much happier adding specific stuff to this as it's not distributed as part of the main please distribution. How does this approach suit you?
I think that sounds like a reasonable approach, though in that case I would also question whether we need this to be part of Please at all - I currently use rclone to do exactly that and it makes sense to offload that functionality to a project that is more dedicated to it. What do you think?
I was looking at using bazel-remote with Please, since it maintains both a local dir cache and uploads/downloads objects from an S3 compatible object store, so if it has the object in the bazel-remote local dir, it doesn't have to ask for it from upstream object store.
The kicker with that is that the CAS store on bazel-remote expects the key to be a sha256 hash, which I've recently learned (thanks @Tatskaari for being awesome and answering my questions) that Please cache hashes are actually a hash of the input to the rule that generated that file not the file itself.
Having something like gocloud support in the //tools/http-cache
project would be really awesome, since a smaller user like myself could take advantage of that really easily.
@TyBrown Nice! I think this could work however I don't plan on productionising the http cache. It will never have things like health checking or status reports.
Saying that, we could make it behave like a proxy in front of a production ready cache (s3 or nginx or whatever). The basic idea is that it's a background process that is spun up on your CI worker just before please is invoked and forwards cache requests to your cache(s).
This isn't top of my priority list right now but I will endeavour to get around to it soon (tm)
I think we'd need to use the action cache for bazel-remote rather than the CAS. It seems to be very similar to the remote execution API whereas our HTTP cache is a lot simpler - we store a single tarball of the outputs keyed by the input hash. I presume we could make use of their action cache for that but haven't looked at it much.
This issue has been automatically marked as stale because it has not had any recent activity in the past 90 days. It will be closed if no further activity occurs. If you require additional support, please reply to this message. Thank you for your contributions.
I created a little something for using an object store as cache storage in CI (like GitHub Actions): https://github.com/sagikazarmark/blob-proxy
@robxu9 please check the new command driven cache from #2234 It's already released and documented. It allows you a straight forward, simple integration of various blob and non-blob stores.
@towe75 This looks awesome! We can close this issue, and I'll go ahead and try that out! Thank you for contributing that!
Yeah agreed. A flexible solution that should prove very useful :D
A follow-up from #1140: It might be beneficial to add gocloud into please, in order to allow blob storage to be used as a cache backend (so backends like AWS S3, Google Cloud Storage, or Azure Blob Storage could be used).
This would allow users who run please builds in different CI providers to connect to a remote blob storage for their cache (or run their own one! there's always the S3-compatible flavour of the week version).
Gocloud provides a common interface, so configuration is universal; however, it seems to call each services' specific SDKs below, so it might be a heavy dependency.