poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

Introducing reproducibility #218

Closed stschiff closed 1 year ago

stschiff commented 1 year ago

An idea came up (credit to @nevrome), to organise package collections ins "Snapshots". A Snapshot is a

The YAML file could be like:

name: poseidon-package-collection-v1.5.0     <- mandatory
commit: 7601a7eff8af5a79b1ef0aaa22ac970a729bd7e9          <- the last commit hash of the git-repo (optional)
GitHub-repo:         <- a URL to the GitHub repository (optional)
last_updated:       <- the date of the last commit
download_date:    <- the timestamp of the time it was downloaded from the server (optional)

Multiple snapshots can be organised (on the server, but also on the client-side) in a single directory such as snapshots/.

The server would present multiple snapshots to the user, and one could select them using APIs and a --snapshot option in the CLI.

On the client-side, the user is responsible for selecting the right base directory for fetching packages. The usual behaviour still applies, that if the user wants to download a package that is already present in the target directory in the given version, it is not downloaded.

It is not clear to me how the client would update/use the Poseidon-snapshot.yml file. It's clear that such a file would be informative and improve reproducibility, but it's not clear to me yet how to update it, or what to do if the user overrides a given snapshot with another one.... would fetch also update the Snapshot file?

nevrome commented 1 year ago

Thanks for writing this down. What about the following idea to be on the safe side: fetch could refuse to edit a base-directory if it detects a POSEIDONSnapshot.yml (?) file. Of course the user should be able to --force overwriting.

Overwriting one snapshot with another one will fail if a package was deleted or renamed. So it should be discouraged.

An additional idea might be to store package names and versions in the snapshot.yml file. Would make it easier to keep an overview or even warn about invalid snapshots.

stschiff commented 1 year ago

Hmm. Yes, I suppose fetch should refuse to override in some cases.

But suppose I haven't downloaded all packages from a snapshot, and then I want to load another one. In that case, it should just work. So fetch should only refuse to download, if the requested server-snapshot differs from the local target snapshot, right?

This would also solve the case with the nightly snapshot, where I see that stuff will get overridden frequently.

So for every fetch call there is i) a targeted snapshot-name on the server (which could implicitly point to the latest LTS), and ii) an optional local snapshot name in the target basedir. Before doing anything, fetch should check if the server- and local-snapshot are the same (if there is a local snapshot... if not, there is no problem). If the two snapshot names agree, then the behaviour should be the same as now, so override only if a version is higher. If the two snapshot names do not agree, stop (or override with --force).

Is that about right?

nevrome commented 1 year ago

That seems to cover every scenario I can think of :thinking:

nevrome commented 1 year ago

After careful deliberation @stschiff and I decided to redesign the planned reproducibility solution for Poseidon once more. The cornerstones of the new solution are as follows:

This should ensure a high level of reproducibility while still being easy to use. What we need for that:

What would be a good name for snapshot.yml? POSEIDONSNAP.yml?

stschiff commented 1 year ago

Great, thanks for the summary. I think I would like poseidon_snapshot.yml? Or capitalised, even though it's uglier?

stschiff commented 1 year ago

Update from today's discussion

nevrome commented 1 year ago

Many of these ideas are already implemented or on the way towards implementation - in one way or the other. This issue is therefore superseded.