Closed 1nv8rzim closed 1 month ago
This works; I am drafting a more generalized approach here so that this same thing can be extended past s3
in the future
This is a great feature, thanks a lot!
This works; I am drafting a more generalized approach https://github.com/1nv8rzim/yeti/pull/2 so that this same thing can be extended past s3 in the future
That was going to be my main comment on this - it would be great if the core of this was provider agnostic. Please let us know once the generalized approach is ready for review!
Another thing that came to mind is documentation for this (especially the env variable part) Would you mind adding a snippet .md file to https://github.com/yeti-platform/yeti-platform.github.io/tree/main/content/docs describing how to do set this up? I'll make sure it ends up in the right directory structure so you don't need to mind any of the hugo stuff.
Thanks again!
Another thing that came to mind is documentation for this (especially the env variable part) Would you mind adding a snippet .md file to yeti-platform/yeti-platform.github.io@main/content/docs describing how to do set this up? I'll make sure it ends up in the right directory structure so you don't need to mind any of the hugo stuff.
Would something like this work? https://github.com/yeti-platform/yeti-platform.github.io/pull/8
That was going to be my main comment on this - it would be great if the core of this was provider agnostic. Please let us know once the generalized approach is ready for review!
Will do! I have tested the I PR I linked locally, need to test in our full deploy though
Yes, the doc PR looks great! I might swoop in and change some typos / the layout of the page but it looks good.
Let me know once both are ready to review! Thanks again :)
@tomchop
Merged in the code for the Generic Storage Clients
. Tested and confirmed everything works. Should be ready for a full review
Based on the order that ^^ and the current PR are merged; we should think about moving over the the new template directory to use the persistient storage clients.
There's a systemic typo:
Persist**i**ent
(should bePersistent
). But, maybe we could just rename this to "FileStorage"?
Fixed in 5a3ea15
Based on the order that ^^ and the current PR are merged; we should think about moving over the the new template directory to use the persistient storage clients.
Thinking back on this again - I think I'm gonna merge https://github.com/yeti-platform/yeti/pull/1141/files as it is, we need to think a bit more as to whether we want to be able to load templates from outside the infrastructure where Yeti lives given that templates can lead to RCE (the whole reason we're moving them out of the db into the filesystem)
Based on the order that ^^ and the current PR are merged; we should think about moving over the the new template directory to use the persistient storage clients.
Thinking back on this again - I think I'm gonna merge #1141 (files) as it is, we need to think a bit more as to whether we want to be able to load templates from outside the infrastructure where Yeti lives given that templates can lead to RCE (the whole reason we're moving them out of the db into the filesystem)
Right now we are going through the process internally of making the api
and tasks
containers to being stateless so that we can run them in replicasets
. This is actually the core reason behind this contribution.
The move of templates from the database to filesystem breaks this statelessness. Having an option to make this change stateless is something we would be very interested in.
Likewise, what is the attack vector that specifically makes storage of templates in arangodb vulnerable to RCE?
As I understood the entire process of having user editable export templates allows RCE through the templates themselves which would be agnostic to wherever they are stored. This would just be an accepted risk because we would assume authenticated should not preform RCE.
Similarly to all of this, if we can trust the file system with storing these templates, shouldn't using a remote storage solution like s3 bucket be just as secure?
Didn't notice it before but the PR removes the ability to edit these templates from the UI. This would work for us since it pretty much makes the templates static and we can just add them directly in at build time and they can be handled as part of the deploy.
Looks like some tests are failing (bad imports), can you please fix this and the ruff errors?
Purpose
One of the things I have been working towards is a stateless deploy of
yeti
in order to deploy several replicas of the celery runner and api containers. The biggest blocker for this is that a shared volumes between several containers is an anti-pattern and not supported on the cluster I am deploying to.This PR introduces the ability to replaces this shared volume with an S3 Complaint Bucket
Changes
system.export_path
is prefixed withs3://
it will attempt to use S3 as the storage mediums for exportssystem.export_path
tos3://bucket_name
would usebucket_name
for storage of export task resultss3
compliant buckets3
compliant buckets3
buckets access should be injected by environment variables