icatproject / icat.server

The ICAT server offering both SOAP and "RESTlike" interfaces to a metadata catalog.
Other
1 stars 5 forks source link

Enable DataPublications to be used for Investigations #261

Closed kevinphippsstfc closed 2 years ago

kevinphippsstfc commented 2 years ago

This is an addition to the changes already proposed and implemented for issue #200

It is purely additional. There are no breaking changes, mandatory fields or mandatory relationships.

The idea was already introduced to the discussion in #200 by Sylvie but was not pursued far enough to make it into the proposal that was accepted for implementation. Having discussed Diamond's plans for DOIs in more detail recently, we are now in a position to make a more concrete proposal.

At RAL, Diamond would like to use DataPublications for minting DOIs for Investigations. ISIS are already minting DOIs for Investigations but moving to using DataPublications for this would probably be a preferable option to the current method.

So to allow DataPublications to cater for this, I am proposing that a new ICAT entity DataCollectionInvestigation is added to the schema to match the existing DataCollectionDataset and DataCollectionDatafile. DataCollectionInvestigations could then be added to DataCollections in the same way that DataCollectionDatasets and DataCollectionDatafiles can be currently.

In addition, to enable search queries to differentiate between different types of DataPublications, I propose either a "type" string field (non-mandatory) or an optional relationship to a new entity DataPublicationType. This entity would have at least the field "name", probably also "description", and possibly "PID". To provide some supporting information, Diamond foresee having "investigation" DataPublication types created automatically, as well as "user-defined" types where users select a subset of data to have a DOI minted for, and the different types are likely to be displayed separately.

RKrahl commented 2 years ago

To summarize as I understand it, your proposal is to make two additions, building on top of what is implemented in #256:

  1. add a many-to-many relation from DataCollection to Investigation such that essentially, investigations can be added as a whole to a data collection in the same way as datasets and datafiles now.
  2. add type information to DataPublication, either as a string attribute or as a new table DataPublicationType.

Ad 1: that change is not limited to the particular use case of data publications. it concerns the scope of what we consider as a data collection in ICAT. In fact, considering TopCAT: the user can add either individual datafiles or datasets or whole investigations to a cart for download. In the same way for IDS: the calls acting on selections of data (getData, prepareData, archive, restore, …) all take lists of investigationIds, datasetIds, and datafileIds as parameter. It seem to be just consistent to use the same notion of what we consider a selection of data also in the definition of DataCollection in the ICAT schema.

Ad 2: that seem to be a relatively small change that doesn't hurt to add in either variant of implementation. I'm leaning towards preferring the more lightweight attribute variant over the DataPublicationType, but I don't have a strong opinion on that. Not sure if a pid attribute in DataPublicationType makes sense though, as the purpose of that type seem to be more related to internal management, rather then to be visible from the outside. But this is a minor comment.

In short: I support the proposal.

kevinphippsstfc commented 2 years ago

Thanks @RKrahl - your summary is indeed totally correct and explains my proposal better than I did 😄

It's a good point that the concept already exists in TopCAT and the IDS, and that the use of Investigations in DataCollections would not just be limited to DataPublications. Thanks for pointing that out.

I also agree about pid. I wasn't sure about that myself but they seem to be very fashionable at the moment. I'm happy to leave that field out for now. It is easier to add later if we do need it.

kevinphippsstfc commented 2 years ago

Proposal was accepted by the ICAT collaboration at a meeting this afternoon.

There was a preference for implementation using DataPublicationType.

agbeltran commented 2 years ago

Included in 5.0.0