Closed jhpoelen closed 3 years ago
I would strongly support this, at least conceptually. The examples given are URIs, and our community (like MANY others) is in the habit of treating URIs as identifiers. This is fine, but unless it's spelled out explicitly, many people (like me) will provide "pure" identifiers (e.g., UUIDs), disentangled from dereferencing metadata (e.g., the "http://blah.com/blah" prefix often added to HTTP URIs). This is not the place to dive deep into that distinction (related to dwciri); but it might be worth clarifying whether this new term (relationshipOfResourceID
) would be framed in the context of DwC "classic" (sensu lato identifiers), or more in the restrictive (sensu LOD / HTTP URIs). I can see arguments both ways, but in either case, the distinction probably should be made clear in the definition of this new term.
Hi @deepreef @jhpoelen
Rich, you wrote:
but in either case, the distinction probably should be made clear in the definition of this new term.
We need clear examples we can continue to point people to for a) the distinction you make between a "pure" identifier and one that has other features like "http:" stitched on in hopes of resolution, and b) we need examples of the use of the Resource Relationship extension in general. Looking forward to seeing efforts to clarify all of this for all involved.
Please see https://github.com/tdwg/dwc/issues/186 for an extended discussion on the topic.
@debpaul I've provided examples in recent threads and referenced issue, so I suggest you take action on documenting this in the way you'd like to see it.
@deepreef I agree that this idea of mandating that identifiers need to be click-able/ resolvable until the end of days is a bit silly, especially given the well-documented issue with location-based identifiers (see e.g., Elliott et al. 2020 https://doi.org/10.1016/j.ecoinf.2020.101132) show that Berner-Lee's "cool" URIs (https://www.w3.org/TR/cooluris/) aren't practical in the long run. However, I don't think this should prevent us from using identifiers that happen to look like URLs. Also, I am ok treating the ID as strings rather than imposing some URI syntax requirement.
As a side note, I've been quite impressed by the amount of time and effort it takes to add a single optional property/column to the Resource Relation extension. I am starting to wonder whether analysis paralysis (https://en.wikipedia.org/wiki/Analysis_paralysis) or bike shedding (e.g., https://en.wiktionary.org/wiki/bikeshedding) are at play, or whether I am just being impatient. I expect it's probably the latter.
However, I don't think this should prevent us from using identifiers that happen to look like URLs.
I agree completely! They certainly can serve as identifiers. The problems come in when things like this: http://somedomain.org/someidentifier changes to: https://somedomain.org/someidentifier
or when something like this: https://dx.doi.org/10.1234/5678 changes to: https://doi.org/10.1234/5678
I was mostly making sure that non-http URIs (non-URIs) would be allowable; in addition to the HTTP-URI identifiers.
@deepreef no denying that common uri schemes can have some variation when humans get involved. Nothing that some regular expression can't fix ; ) (e.g., https://github.com/bio-guoda/preston-identifier-registry/blob/main/registry.tsv ). Also, subtitle differences between doi printed/url versions vs. vanilla dois can throw sand in integration engines if they are not aware of them (https://github.com/globalbioticinteractions/doi4j) . I imagine these variations occur even when non-URI are used. Would be a fun research projects to have a look at the occurrences of these identifier variations. Let me know if you are interested to collaborate on this.
Agreed! It doesn't help that people (including some in our own community) treat these identifiers as if they were meant to be used by humans, rather than machines. DOIs are a rare example that seems to support both needs, but also with compromises on both. I'm a strong advocate for using identifiers optimized for machines when machines are talking to machines, and different identifiers optimized for humans when humans communicate with other humans, or humans and machines communicate with each other. The identifier "Homo sapiens" works really well for humans, and the identifier 58d31d52-713d-44b4-9fe9-cb2d9249c422 works really good for computers. Too often, people try to make the computer identifiers more friendly to humans, and that's where the problems usually happen.
As a side note, I've been quite impressed by the amount of time and effort it takes to add a single optional property/column to the Resource Relation extension. I am starting to wonder whether analysis paralysis (https://en.wikipedia.org/wiki/Analysis_paralysis) or bike shedding (e.g., https://en.wiktionary.org/wiki/bikeshedding) are at play, or whether I am just being impatient. I expect it's probably the latter.
@jhpoelen It is a combination of both exacerbated terribly by the availability of volunteer time. The past year has been particularly challenging in the latter respect, but I have managed to clear the table enough to try a concerted semi-annual effort to move along mature proposals, of which this one qualifies. My next task will be to consolidate and make final touches for the issues that are ready to move forward to public review (and those that do not require it because the changes are non-normative and non-breaking).
Here is an updated term change proposal following Darwin Core definition patterns and with additional information provided. @jhpoelen Please review and suggest any final changes before taking this to public review.
Submitter: Jorrit Poelen @jhpoelen
Justification (why is this term necessary?): The addition of predicate identifiers makes the ResourceRelationship class much more readily translatable into RDF and thence into broader integrative environments than spreadsheet-based data alone. This proposal "gives us a way to complete the task that the RDF guide dodged: figuring out how to recommend that people 'convert' resourceRelationship data from spreadsheets into RDF. (@baskaufs)" This addition facilitates the publication of relationship data as spreadsheets that can be transformed with tools into Linked Open Data, for example. This proposal facilitates the use of existing vocabularies of relationship types.
Proponents (at least two independent parties who need this term): GLoBI, Arctos, iNaturalist, TrIAS
Proposed attributes of the new term:
http://purl.obolibrary.org/obo/RO_0002456
(for the relation "pollinated by"), http://purl.obolibrary.org/obo/RO_0002455
(for the relation "pollinates"), https://www.inaturalist.org/observation_fields/879
(for the relation "eaten by"Note: It is important to address Issue #194 at the same time to fix the unintended reversed relationship direction of the term relationshipOfResource. These two terms must be in agreement about the directionality of the relationship.
One brief comment on @deepreef's earlier comment about instability of HTTP IRIs: I agree with your frustration about the variation in https://orcid.org/... vs. http://orcid.org..., and dx.doi.org vs. doi.org . However, in both of those cases, the issuers have "gotten their act together" and there seems to be stable, consensus "best" forms for the IRIs: https://orcid.org/blah and https://doi.org/blah . So we will undoubtedly continue to encounter variants, but over time this will get better.
@tucotuco Thank you for taking the effort to review the relationshipOfResourceID proposal and prepare the proposal for public comment. I realize that I might have been a little impatient, especially considering the immense task of maintaining DwC.
I just reviewed your revised proposal mentioned in https://github.com/tdwg/dwc/issues/283#issuecomment-817185410 and it looks good to me.
PS. Perhaps this tweak to the Resource Relation extension will usher in a new non-star schema era . . . make way for more flexibility in integrating data, and might very well provide a starting point for improved and explicit dependency management between collections. (e.g., collection A contains references to collection B).
Thanks, @baskaufs : I wouldn't say that I'm "frustrated", necessarily -- just cognizant that identifiers work best when they are stable, and embedding dereferencing metadata within the identifier itself can pose challenges to maintaining that stability. That's why I decoupled identifiers from dereferencing mechanisms in BioGUID.
As noted in table 3.8 of the RDF guide, we dodged the issue of expressing ResourceRelationship data as RDF because we couldn't figure out the best way to do it. I think we could revisit that issue in the context of these new term additions/changes. But for now, this addition won't have an immediate effect on any dwcIri:
analog.
Done.
@tucotuco et al. Thanks for your hard work in adding the new Resource Relationship term relationshipOfResourceID . Maintaining a standard like DwC is no small feat!
I am happy to report that the Resource Relations has been adopted by the Field Museum, and recently, by iNaturalist, see blog post -
Nov 3, 2023 Field Museum and iNaturalist Adopt Darwin Core Resource Relationship Standard to Share Species Interaction Records The Field Museum in Chicago and iNaturalist capture detailed records on how species interact. They both showed their capacity to innovate by using the recently improved Darwin Core Resource Relationship extensions to publish their interaction records. By using this standards based approach, they facilitate access to the valueable biodiversity knowledge they keep, and provide examples for others to follow. More ...
New term
Proposed attributes of the new term:
Term name (in lowerCamelCase): relationshipOfResourceID
Organized in Class (e.g. Location, Taxon): ResourceRelationship
Definition of the term: An identifier for the relationship type (predicate) that connects the subject identified by resourceID to its object identified by relatedResourceID.
Usage comments (recommendations regarding content, etc.): Recommended best practice is to use the identifiers of the terms in a controlled vocabulary, such as the OBO Relation Ontology.
Examples:
http://purl.obolibrary.org/obo/RO_0002456
(for the relation "pollinated by"),http://purl.obolibrary.org/obo/RO_0002455
(for the relation "pollinates"),https://www.inaturalist.org/observation_fields/879
(for the relation "eaten by")Refines (identifier of the broader term this term refines, if applicable): None
Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable): None
ABCD 2.06 (XPATH of the equivalent term in ABCD, if applicable): not in ABCD