Closed stschiff closed 1 year ago
Thanks for writing this down. What about the following idea to be on the safe side: fetch
could refuse to edit a base-directory if it detects a POSEIDONSnapshot.yml
(?) file. Of course the user should be able to --force
overwriting.
Overwriting one snapshot with another one will fail if a package was deleted or renamed. So it should be discouraged.
An additional idea might be to store package names and versions in the snapshot.yml
file. Would make it easier to keep an overview or even warn about invalid
snapshots.
Hmm. Yes, I suppose fetch
should refuse to override in some cases.
But suppose I haven't downloaded all packages from a snapshot, and then I want to load another one. In that case, it should just work. So fetch should only refuse to download, if the requested server-snapshot differs from the local target snapshot, right?
This would also solve the case with the nightly snapshot, where I see that stuff will get overridden frequently.
So for every fetch
call there is i) a targeted snapshot-name on the server (which could implicitly point to the latest LTS), and ii) an optional local snapshot name in the target basedir. Before doing anything, fetch
should check if the server- and local-snapshot are the same (if there is a local snapshot... if not, there is no problem). If the two snapshot names agree, then the behaviour should be the same as now, so override only if a version is higher. If the two snapshot names do not agree, stop (or override with --force
).
Is that about right?
That seems to cover every scenario I can think of :thinking:
After careful deliberation @stschiff and I decided to redesign the planned reproducibility solution for Poseidon once more. The cornerstones of the new solution are as follows:
trident fetch
(and maybe list
?) can request specific package versions (using the new API)snapshot.yml
") that lists packages and their respective versionssnapshot.yml
files based on their baseDirs with a new subcommand trident snapshot
fetch
(and maybe list
) can read snapshot.yml
files to thus recreate old datasetsThis should ensure a high level of reproducibility while still being easy to use. What we need for that:
snapshot.yml
files [Stephan]trident
subcommand snapshot
[Clemens]What would be a good name for snapshot.yml
? POSEIDONSNAP.yml
?
Great, thanks for the summary. I think I would like poseidon_snapshot.yml
? Or capitalised, even though it's uglier?
Update from today's discussion
Many of these ideas are already implemented or on the way towards implementation - in one way or the other. This issue is therefore superseded.
An idea came up (credit to @nevrome), to organise package collections ins "Snapshots". A Snapshot is a
poseidon-snapshot.yml
YAML-formatted file in the root of that base directory.The YAML file could be like:
Multiple snapshots can be organised (on the server, but also on the client-side) in a single directory such as
snapshots/
.The server would present multiple snapshots to the user, and one could select them using APIs and a
--snapshot
option in the CLI.On the client-side, the user is responsible for selecting the right base directory for fetching packages. The usual behaviour still applies, that if the user wants to download a package that is already present in the target directory in the given version, it is not downloaded.
It is not clear to me how the client would update/use the
Poseidon-snapshot.yml
file. It's clear that such a file would be informative and improve reproducibility, but it's not clear to me yet how to update it, or what to do if the user overrides a given snapshot with another one.... wouldfetch
also update the Snapshot file?