Open peer35 opened 4 years ago
Excellent topic. Additional alternatives (in addition to disabling per project)
Question. Does Yoda restore deleted files from the iRODS trash or from the "restore versions" (related to #4 )?
Some more discussion between me & @bgoli
Regarding the revisioning. There is always at least one copy made for all "research data" files so this effectively halves your capacity (and doubles your cost).
This can be surprising for researchers that have workflows using a lot of working data that is not moved directly to the vault.
Some technical aspects which I have considered (and probably just highlight my ignorance):
Could Yoda revisioning be linked to the locking mechanism so that only unlocked folders are versioned. --> This would mean all revisions are deleted when you lock a folder?
Could revision zero be create-on-modify rather than create-on-copy and all revisions are cleaned up over time (not all-1 as is now the case). --> That could work. A revision strategy where the last copy is also deleted after a week or so if it is the same as the live version. You could still make a revision copy on delete to guard against accidental deletes, which can also be removed after a week.
Is it possible to have the Yoda revision data stored on colder storage.
--> That could also be possible. The projects and vault are in /
Could the option be added to define projects (research collections) where revisioning is turned off.
--> I think the revisioning strategy is set per category, not per project, that’s annoying. It would probably mean creating a new “group type”, e.g. all “sourcedata-
The use case is as follows: We are using Yoda as a working archive to collect Illumina sequencing data from various projects. These long running projects are ongoing and so the data collections (folders) are all being extended as more data is generated and added (and locked in between).
Sequence data is typically hundreds of gzipped text files (FASTQ). It is unlikely that file is updated but if it is changed then the normal revision mechanism can kick in.
One could (legitimately) argue that the vault should be used, however, in this case one can assume that whether this data will be published, or even used, will only be known in a year(s)+ timescale. --> Yes, putting stuff in the vault is seen as irreversible in Yoda, so it would be bad practice to put stuff in you are not sure about.
closed by accident
By default Yoda creates a "version" every time a new file is created. This means double storage cost.
Expected behaviour:
Workaround: