vu-rdm-tech / yoda-pilot

A place to track issues we run into in the Yoda pilots
0 stars 0 forks source link

Improve versioning #3

Open peer35 opened 4 years ago

peer35 commented 4 years ago

By default Yoda creates a "version" every time a new file is created. This means double storage cost.

Expected behaviour:

Workaround:

bgoli commented 4 years ago

Excellent topic. Additional alternatives (in addition to disabling per project)

Question. Does Yoda restore deleted files from the iRODS trash or from the "restore versions" (related to #4 )?

peer35 commented 4 years ago

Note: rules are defined in https://github.com/UtrechtUniversity/irods-ruleset-research

https://github.com/UtrechtUniversity/irods-ruleset-research/blob/master/iiRevisions.r https://github.com/UtrechtUniversity/irods-ruleset-research/blob/master/tools/revision-clean-up.r

peer35 commented 4 years ago

Some more discussion between me & @bgoli

Regarding the revisioning. There is always at least one copy made for all "research data" files so this effectively halves your capacity (and doubles your cost).

This can be surprising for researchers that have workflows using a lot of working data that is not moved directly to the vault.

Some technical aspects which I have considered (and probably just highlight my ignorance):

The use case is as follows: We are using Yoda as a working archive to collect Illumina sequencing data from various projects. These long running projects are ongoing and so the data collections (folders) are all being extended as more data is generated and added (and locked in between).

Sequence data is typically hundreds of gzipped text files (FASTQ). It is unlikely that file is updated but if it is changed then the normal revision mechanism can kick in.

One could (legitimately) argue that the vault should be used, however, in this case one can assume that whether this data will be published, or even used, will only be known in a year(s)+ timescale. --> Yes, putting stuff in the vault is seen as irreversible in Yoda, so it would be bad practice to put stuff in you are not sure about.

peer35 commented 4 years ago

closed by accident