Open MathildeMousset opened 5 years ago
Hi Mathilde,
It is definitely possible to have Occurrences (http://rs.tdwg.org/dwc/terms/#occurrence) in an Event (http://rs.tdwg.org/dwc/terms/#event) data set, with Measurements of Facts (http://rs.tdwg.org/dwc/terms/#measurementorfact) for those Occurrences and Relationships (http://rs.tdwg.org/dwc/terms/#resourcerelationship) between them.
The star schema allows records in an extension file to be related to records in a core file (see https://dwc.tdwg.org/text/#3-implementation-guide). If the core file is an Event Core, then every extension file must have an eventID filled in to relate the records in the extension to the appropriate record in the core. It looks like you already understand this part, but I am including it for others who might be reading about how extensions work here for the first time. The rest is a little tricky.
The Extended Measurement or Fact extension (https://tools.gbif.org/dwca-validator/extension.do?id=http://rs.iobis.org/obis/terms/ExtendedMeasurementOrFact) was designed to make it possible to include measurements or facts for Events in the Core as well as for Occurrences in an Occurrence extension to that core. The extension includes an occurrenceID field. If the measurement is about an Occurrence, then the occurrenceID has to contain the identifier for the Occurrence record it is about. The Measurement or Fact record also has to have the eventID for that Occurrence, just as it does in the Occurrence extension. If the measurement is about an Event rather than an Occurrence, then the eventID would be filled in, but the occurrenceID would be left empty. This trick provides a way to get beyond the limitations of the star schema, but one has to understand that this is how that particular extension works.
The Resource Relationship extension can relate anything to anything, but the two things being related in the extension have to be unambiguously identified. If all the records being related are within the same data set, then the identifiers need only be unique within that data set. But the Resource Relationship extension can even relate records in its data set to other records that aren't in the same data set, if the identifiers in the other data sets are unambiguous. This is one of the beauties of persistent, resolvable, globally unique identifiers.
The last tricky bit is that the Resource Relationship file is necessarily an extension to the core file in the data set, so in the case of an Event Core, the records in the Resource Relationship file must have an eventID. That should not normally be a problem though, since the Occurrences in the Occurrence extension already have the related eventIDs. Just use the eventID/resourceID combination in the Related Resource extension as the eventID/occurrenceID combination in the Occurrence extension.
I have created a demonstration data set to try to illustrate how this would work for a super simple case of one event where a mother tuco and her two pups were weighed. You can find the data set at http://ipt.vertnet.org:8080/ipt/resource?r=event_test.
Thank you for your very quick answer John. I cannot seem to access the demonstration dataset, though.
I don't know why you would not be able to see that. Tests should it is public and accessible. To make sure it's available, I put a copy of the Darwin Core archive in https://github.com/tdwg/dwc-qa/blob/master/examples/dwca-event_test-v1.0.zip.
I am confused by the star schema of the Darwin Core Archives.
My datasets are typically organised around an Event Core. In that context, I have the impression that it is not possible to use the resourceRelationship extension to link occurence records with each others, or eMoF records to each others, without breaking the star organisation. Am I misunderstanding something?