Closed cpavloud closed 2 years ago
We could build a .sh
script that will be in the end of the Dockerfile that will run commands such as
swarm --version
and keep that in a file that would be part of the image.
This way when we have an updated version, this will be automatically be updated.
Since pema:v.2.1.4
the user may find 2 files when running a pema container,
pema_environment.tsv
and pema_R_packages.tsv
describing the issues mentioned.
In addition, there is also the encapsulated_software.md
file under the help_files
directory with a snapshot of the software at pema:v.2.1.4
.
Opening again the issue...!
Could we have something like a "run id"? So that the user can separate different runs using the same data but with different parameters?
Or something like a date, time and information on where PEMA is running, i.e. on which cluster?
I am thinking that it cannot be done automatically. But could it be added by the user perhaps? As an extra, non-meaningful, non-software related, parameter?
The easiest thing to do would be to add a uuid.
This way you could have on the copy of the parameters.tsv
file that is
produced by pema and is saved on the output directory,
something like that:
run_id c87d8885-95e8-4cc2-8f33-48dda0cc4467
To do so, you just use the following bash command:
uuid=$(uuidgen)
In addition, you could also have something based on the date and time, for example:
10:22--01-12-2022
Again, to do this would be a single bash command:
date +'%I:%M--%m-%d-%Y'
Tell me if you like something like that and I ll try to build an image as soon as possible.
That's great!
This has been moved to issue #35.
We should add in the parameters file the version of SWARM algorithm that is implemented in PEMA. Also, the version of CROP and of the RAxML-ng (and PaPaRa and EPA-ng). And the version of cutadapt that is being used for the primer removal in the case of ITS. And for the MIDORI database, we need to specify the GenBank release that it was based on. I think that for all the other tools, the versioning information is already there.
Also, we should mention somewhere in the parameters file that RPDClassifier is being used for the COI gene and we should also mention the version of the RPDClassifier. Similarly, we should also mention the CREST is being used for the 16S, 18S and ITS markers.
Also, we should add the thresholds/default values used by the classifiers for the taxonomic identification of the sequences. Then, we could add this information in the otu_seq_comp_appr term when submitting data to GBIF/OBIS using the DwC-A format.
Then, after every analysis, the user will have full provenance (regarding tools and parameters implemented) stored in the copy of the parameters file inside the output folder.