Open guiguem opened 7 years ago
Currently DIRAC datasets contain:
*It seems as though only one annotation is allowed per dataset, despite there being a separate table for this information. This is because new annotations are added with REPLACE which will delete existing annotations which share the same primary key, in this case the dataset number.
The database which stores this information is defined here: https://github.com/DIRACGrid/DIRAC/blob/ce62643461be63dea717c8a3b96087e77f38137f/DataManagementSystem/DB/FileCatalogWithFkAndPsDB.sql#L286
DIRAC's dataset functions which use this data: https://github.com/DIRACGrid/DIRAC/blob/9f408b12ca57a92207e9c6a0defdf6efcdf81bc5/DataManagementSystem/DB/FileCatalogComponents/DatasetManager.py
The dirac-dms-filecatalog-cli dataset command is (to my knowledge) the only interface to the dataset system. It's functionality as described by its help:
A set of dataset manipulation commands
Usage:
dataset add
dataset rm
dataset update
dataset release
We would like DIRAC to contain this information:
Currently DIRAC does not support modifying existing data sets, except replace the annotation, this is a critical feature to us.
edit: Apparently this is supported by the back end (at least there is code which appears to have accomplished this at one point), but the interface to it through the filecatalog-cli appears to be broken. The command is 'dataset update' but it doesn't seem to do anything.
This issue is to track progress on extending DIRAC's dataset management and metadata features.
Some motivations and goals for the dataset management project: