project8 / DIRAC

DIRAC Grid
http://diracgrid.org
GNU General Public License v3.0
0 stars 0 forks source link

Dataset manager #1

Open guiguem opened 7 years ago

guiguem commented 7 years ago

This issue is to track progress on extending DIRAC's dataset management and metadata features.

Some motivations and goals for the dataset management project:

ManyAngledOne commented 7 years ago

Currently DIRAC datasets contain:

*It seems as though only one annotation is allowed per dataset, despite there being a separate table for this information. This is because new annotations are added with REPLACE which will delete existing annotations which share the same primary key, in this case the dataset number.

The database which stores this information is defined here: https://github.com/DIRACGrid/DIRAC/blob/ce62643461be63dea717c8a3b96087e77f38137f/DataManagementSystem/DB/FileCatalogWithFkAndPsDB.sql#L286

DIRAC's dataset functions which use this data: https://github.com/DIRACGrid/DIRAC/blob/9f408b12ca57a92207e9c6a0defdf6efcdf81bc5/DataManagementSystem/DB/FileCatalogComponents/DatasetManager.py

ManyAngledOne commented 7 years ago

The dirac-dms-filecatalog-cli dataset command is (to my knowledge) the only interface to the dataset system. It's functionality as described by its help:

A set of dataset manipulation commands Usage: dataset add - add a new dataset definition dataset annotate - add annotation to a dataset dataset show [-l] [] - show existing datasets dataset status - display the dataset status dataset files - show dataset files
dataset rm - remove dataset dataset check - check if the dataset parameters are still valid
dataset update - update the dataset parameters dataset freeze - fix the current contents of the dataset
dataset release - release the dynamic dataset

https://github.com/DIRACGrid/DIRAC/blob/9f408b12ca57a92207e9c6a0defdf6efcdf81bc5/DataManagementSystem/Client/FileCatalogClientCLI.py#L1979

ManyAngledOne commented 7 years ago

We would like DIRAC to contain this information:

ManyAngledOne commented 7 years ago

Currently DIRAC does not support modifying existing data sets, except replace the annotation, this is a critical feature to us.

edit: Apparently this is supported by the back end (at least there is code which appears to have accomplished this at one point), but the interface to it through the filecatalog-cli appears to be broken. The command is 'dataset update' but it doesn't seem to do anything.