scylladb / scylla-ccm

Cassandra Cluster Manager, modified for Scylla
Apache License 2.0
20 stars 62 forks source link

Improve caching of local tarballs #521

Closed avikivity closed 5 months ago

avikivity commented 8 months ago

scylla-dtest says:

NOTE: ~/.ccm/scylla-repository/local_tarball should be deleted, otherwise it would use it as is. ccm currently isn't very smart on the way it's caching the versions.

If an occasional user forgets this note, then all their runs will be invalid, and they will have no way of knowing. It's a huge waste of developer time.

fruch commented 8 months ago

base on what exactly would you exact it to be cached ?, since it can be sourced from multiple places, i.e. local files, remote files ?

we won't want it to be downloading again and again from s3, it also gonna be a huge waste of developer time...

nyh commented 8 months ago

base on what exactly would you exact it to be cached ?, since it can be sourced from multiple places, i.e. local files, remote files ?

we won't want it to be downloading again and again from s3, it also gonna be a huge waste of developer time...

You have some code that creates "local_tarball" when it doesn't exists. This code decides what to download/copy/build based on some environment variables or options. It can save these environment variables to a file (e.g., local_tarball.source) and on the next run, if these variables are not the same as they were saved, the local_tarball and local_tarball.source are deleted and everything is recreated.

fruch commented 8 months ago

base on what exactly would you exact it to be cached ?, since it can be sourced from multiple places, i.e. local files, remote files ? we won't want it to be downloading again and again from s3, it also gonna be a huge waste of developer time...

You have some code that creates "local_tarball" when it doesn't exists. This code decides what to download/copy/build based on some environment variables or options. It can save these environment variables to a file (e.g., local_tarball.source) and on the next run, if these variables are not the same as they were saved, the local_tarball and local_tarball.source are deleted and everything is recreated.

and if you recompiled it, and the file is in the same location, it won't help much. the end situation would be same

avikivity commented 8 months ago

You can store the timestamp in local_tarball/.timestamp. Get the timestamp via stat(2) or curl -X HEAD.

If the URL has an embedded timestamp, you can assume that the file won't change and use the embedded timestamp.

nyh commented 8 months ago

and if you recompiled it, and the file is in the same location, it won't help much. the end situation would be same

To fix that possibility, the "local_tarball.source" file can also include the last modification date of each file or HTTP download involved. I'm not sure I know all the details, but it doesn't sound impossible.

fruch commented 8 months ago

so to recap,

so timestamp of the file should be used, we have 3 options where a file can come from 1) s3 api 2) http 3) local file

if timestamp of the source file is newer, ccm should clear the directory and download/extract is again

and we gonna implement this only for using the unified package (no the old method where there are multiple packages being used, I would want to deprecate it at some point)

juliayakovlev commented 5 months ago

https://github.com/scylladb/scylla-ccm/pull/557/files

fruch commented 5 months ago

closed in https://github.com/scylladb/scylla-ccm/pull/557