Relationship to ProvDM - Githubissues

msdemlei commented 1 year ago

ProvDM and this also have a lot of overlap. In particular, I think it would really be a shame if there were independent classes for actors ("Party") in the two DMs (I mention in passing that there's another "Party" in SimDM).

In the interest of keeping the number of models and standards down: How much would we have to add to ProvDM to make it cover what this is intended to cover? Isn't a dataset perhaps just a special sort of Entity in the ProvDM sense?

mcdittmar commented 1 year ago

Good point.. there is definitely a significant overlap in content.

My goal for this version of the model was to consolidate the existing content from Spectrum/ObsCore/etc to a single model, in a way that the migration is fairly obvious.

I've always wondered how Provenance would fold in, but I think that is a project on its own.

msdemlei commented 1 year ago

On Tue, Mar 21, 2023 at 08:52:35AM -0700, Mark Cresitello-Dittmar wrote:

Good point.. there is definitely a significant overlap in content.

My goal for this version of the model was to consolidate the existing content from Spectrum/ObsCore/etc to a single model, in a way that the migration is fairly obvious.

Hm -- what would that consolidation buy us, in technical terms?

I've always wondered how Provenance would fold in, but I think that is a project on its own.

Couldn't perhaps the existing parts be consolidated into ProvDM? You see, that already exists and is a REC -- and if we could sunset the DatasetDM effort before it becomes REC and is then hard to withdraw, that'd be one REC less for people to read, understand, align with the remaining RECs and possibly implement. Which in my book would be a big win.

@mservillat, @olebole, what do you think?

mcdittmar commented 1 year ago

On Tue, Mar 21, 2023 at 08:52:35AM -0700, Mark Cresitello-Dittmar wrote: Good point.. there is definitely a significant overlap in content. My goal for this version of the model was to consolidate the existing content from Spectrum/ObsCore/etc to a single model, in a way that the migration is fairly obvious. Hm -- what would that consolidation buy us, in technical terms?

In developing the Cube model, the choices were:

include yet another copy of the Dataset metadata content
isolate that content into a separate model which could then be reused by all interested parties. Consolidating that information make it reusable and consistent, which saves implementation effort.

msdemlei commented 1 year ago

On Wed, Mar 22, 2023 at 06:47:15AM -0700, Mark Cresitello-Dittmar wrote:

In developing the Cube model, the choices were:

include yet another copy of the Dataset metadata content

isolate that content into a separate model which could then be reused by all interested parties. Consolidating that information make it reusable and consistent, which saves implementation effort.

Sure -- but perhaps Cube could just use ProvDM then? If there's just a few attributes or classes missing there, perhaps we can add them there rather than start a new DM? I suppose having the built-in place for proper provenance you'll get for free by doing that would be a very nice side benefit, no?

mcdittmar commented 1 year ago

On Wed, Mar 22, 2023 at 06:47:15AM -0700, Mark Cresitello-Dittmar wrote: In developing the Cube model, the choices were: 1. include yet another copy of the Dataset metadata content 2. isolate that content into a separate model which could then be reused by all interested parties. Consolidating that information make it reusable and consistent, which saves implementation effort. Sure -- but perhaps Cube could just use ProvDM then? If there's just a few attributes or classes missing there, perhaps we can add them there rather than start a new DM? I suppose having the built-in place for proper provenance you'll get for free by doing that would be a very nice side benefit, no?

There is enough content here, that would be re-usable in contexts outside of Cube, that I think we'd still want to encapsulate it into its own document. We'd want to formalize how to use Provenance to represent this content (what roles, structure).

If we consider Dataset a "prov:Entity", then we have to decide.. is it extending Entity? or are we simply reusing the Provenance pattern? or is it actually an Entity with name="Dataset"?
then we agree ds:Party == prov:Agent, so that's a win.. plugging that into the Dataset Entity means
- it has a list of WasAttributedTo instances with role="contact|publisher|contributor|creator|etc", each referencing an Agent instance.
- this will obliterate the current groupings (DataID, Curation), unless we define them as Activities (Curated, Published, etc). That would setup a hierarchy of Entity-s for the Dataset at the different stages: Created by A, Published by B, Curated by C.. each adding its own segment of metadata.
For the best match to the current content, we'd probably like Dataset.wasGeneratedBy to be the 'Observation' instance, making Observation an Activity with some ActivityConfiguration.. that could work. However, the workflow diagram (Figure 1) really goes step-wise, so the Dataset is created by the software process, which eventually leads back to the Observation.

In the end, I think this would create a much larger hierarchy of objects than we want in this usage, and would be very difficult to make the connection with current Dataset content in ObsCore, Spectrum and Char. Our data product headers select very specific pieces of information from the Provenance tree to carry around with them in a compact structure.

I think there's something to be gained in discussing the relation between the two models, and how we see them evolving together, but I'd be very surprised if we can make the jump in one step.

ivoa-std / DatasetDM

Relationship to ProvDM #8