ottokruse / s3-spa-upload

Upload a single page application to S3 with the right content-type and cache-control meta-data
22 stars 3 forks source link

wait some time before clean-up old files #8

Open helperFunction opened 1 year ago

helperFunction commented 1 year ago

Hi :)

first: i love this package! great how easy it is to deploy a spa to s3/cloudfront and leveraging tiered TTLs on the go.

I just stumbled about an edge-case for the --delete option:

  1. A user downloads index.html (v1)
  2. spa gets deployed (v2) and (v1) gets deleted
  3. user's browser wants to download v1 linked resources

so this only affects users with a slow connection and only if (v1)-files are not in the cache in cloudfront.

possible solution: add a cli parameter to configure how long to wait before delete old files. Probably a default of 30s/60s should be enough to give slower clients enough time to download everything

Thanks for your work!

and again: i love this one :)

ottokruse commented 1 year ago

Thanks for the feedback and great idea! Will look into it when I have some bandwidth

krcourville commented 5 months ago

Interesting topic. Glad I read this before using the delete feature.

An initial thought was to set a TTL on old objects in s3 vs deleting them. So far, I don't see that you can directly apply a TTL to objects. But it does look like you can apply a tag such as "delete-me=true". Then define a lifecycle policy in the bucket that removes those tagged objects after some time.

From the perspective of this utility, maybe the delete option could be extended to use multiple strategies:

  1. just delete (current strategy)
  2. Apply a given tag
ottokruse commented 5 months ago

TTL is a great idea for sure

krcourville commented 5 months ago

I may be tempted to contribute a PR if you want to hammer out the desired augments.

maybe

s3-spa-upload dist-dir my-s3-bucket-name --prefix mobile --delete-with-tag 'archive=true'

or...

--apply-life-cycle-tag 'archive=true'

ottokruse commented 5 months ago

That would be awesome đź‘Ź

How about:

--tag-old-files 'key=value'

Let's make actually providing 'key=value' optional and default to 's3-spa-upload:archive=true'

And add a line in the docs that you're supposed to create an accompanying lifecycle rule to delete the files?

krcourville commented 5 months ago

I like it.

helperFunction commented 5 months ago

hey guys :)

like the idea with the tag.

but for the ease of use of this tool i think it would be better if we find a solution without having to setup a bucket lifecycle rule.

what about tagging a TTL=timestamp and at the next run/deployment all files with an expired TTL get removed?

ottokruse commented 5 months ago

Both solutions make sense. Your option requires another deploy to clean up the previous one (which works and is pragmatic), the lifecycle rule doesn't need that though which is nice as well.

it would be better if we find a solution without having to setup a bucket lifecycle rule

What's your take there, why do you want to skip creating the bucket lifecycle rule? Just simpler if you didn't have to do that?

krcourville commented 5 months ago

I suppose, If you don’t control your own aws infrastructure, I could see where it might be a pain to deal with a lifecycle policy, depending on your relationship with the s3 bucket administrator.

ottokruse commented 5 months ago

To prevent the problem listed by @helperFunction we need to tag each uploaded file with a version nr or timestamp, and then upon deleting old files, only delete files with version nr == current version - 2. This way you will always have 2 generations of files on S3––but not more:

The upload before V2, V1, would be deleted.

In that way, users with slow connections who already started downloading V2 during the upload of V3, can proceed without error. (And we are then assuming there aren't any users downloading V1 still which is likely but not guaranteed)

Maybe we flag this as such:

s3-spa-upload dist bucketname --delete --keep-old-generations 1

And maybe we should make the default of --keep-old-generations to be 1 so that just typing s3-spa-upload dist bucketname --delete would be enough to tap into this functionality.

The current functionality can still be achieved then by doing:

s3-spa-upload dist bucketname --delete --keep-old-generations 0

Would that work for both your use cases @krcourville @helperFunction ?

krcourville commented 5 months ago

As far as I can tell, a lifecycle policy filter can only work against constant values on tags. The filtering only supports "equals", not "greater than", "starts with", or otherwise. more info

Would we need separate arguments for each strategy?

That said, I'm not attached to the s3 lifecycle-managed option if the versioned option works fine.

With the versioned option, would we need to bootstrap an existing deployment somehow? Otherwise, to start, would there be objects with no generation/version tag?

Maybe I'm overthinking. If the tag does not exist, could we assume it's the previous version?

Also, how is "current version" determined?

ottokruse commented 5 months ago

Regard objects without the version tag to be old and eligible for delete? Easiest I guess.

Can use a timestamp with second precision as version. We'd have to list the bucket to find out which previous versions exist. But we have to do list bucket anyway to do the deletes.

And agreed if we also want to support the lifecycle method that would need a static tag value. And a separate cli parameter to trigger it.

krcourville commented 5 months ago

A timestamp does add more meaning and potential usefulness to the tag.

Ok. Since the bucket has to be listed anyway, assuming that is cached, it could be avoided again. Makes sense.

To keep the most recent "n" generations, do we first iterate the bucket, accumulate a list of distinct versions, determine which will be kept based on sort order, and purge the rest?

I suspect the next request will be: "can we use the version number generated by my code pipeline?" In which case, we could default to timestamp and allow override. If maintaining more than one old version is required, it's up to the lib consumer to ensure their version is chronologically sortable.

Maybe that doesn't matter. If the scope here is about syncing from built web app to deployment bucket, while minimizing the chance of someone getting a 404 response. Support for rollback/revert to previous versions is the only reason I can think of to get that complicated.

Personally, using a timestamp as the current version, pushing new files to the bucket with that value, deleting anything else that doesn't match that value , for my use case would be fine. I wouldn't be looking to keep multiple versions.

At this point, I can commit to adding the little bit of code that would be required for "add this specified tag instead of delete". Based on your specification above:

--tag-old-files 'key=value' Let's make actually providing 'key=value' optional and default to 's3-spa-upload:archive=true'

And also providing an example lifecycle policy.

ottokruse commented 5 months ago

Adding that flag to enable lifecycle policies would be awesome, please go for it.