tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
205 stars 70 forks source link

New Term - relationshipOfResourceID #283

Closed jhpoelen closed 3 years ago

jhpoelen commented 4 years ago

New term

Proposed attributes of the new term:

deepreef commented 3 years ago

I would strongly support this, at least conceptually. The examples given are URIs, and our community (like MANY others) is in the habit of treating URIs as identifiers. This is fine, but unless it's spelled out explicitly, many people (like me) will provide "pure" identifiers (e.g., UUIDs), disentangled from dereferencing metadata (e.g., the "http://blah.com/blah" prefix often added to HTTP URIs). This is not the place to dive deep into that distinction (related to dwciri); but it might be worth clarifying whether this new term (relationshipOfResourceID) would be framed in the context of DwC "classic" (sensu lato identifiers), or more in the restrictive (sensu LOD / HTTP URIs). I can see arguments both ways, but in either case, the distinction probably should be made clear in the definition of this new term.

debpaul commented 3 years ago

Hi @deepreef @jhpoelen

Rich, you wrote:

but in either case, the distinction probably should be made clear in the definition of this new term.

We need clear examples we can continue to point people to for a) the distinction you make between a "pure" identifier and one that has other features like "http:" stitched on in hopes of resolution, and b) we need examples of the use of the Resource Relationship extension in general. Looking forward to seeing efforts to clarify all of this for all involved.

jhpoelen commented 3 years ago

Please see https://github.com/tdwg/dwc/issues/186 for an extended discussion on the topic.

@debpaul I've provided examples in recent threads and referenced issue, so I suggest you take action on documenting this in the way you'd like to see it.

@deepreef I agree that this idea of mandating that identifiers need to be click-able/ resolvable until the end of days is a bit silly, especially given the well-documented issue with location-based identifiers (see e.g., Elliott et al. 2020 https://doi.org/10.1016/j.ecoinf.2020.101132) show that Berner-Lee's "cool" URIs (https://www.w3.org/TR/cooluris/) aren't practical in the long run. However, I don't think this should prevent us from using identifiers that happen to look like URLs. Also, I am ok treating the ID as strings rather than imposing some URI syntax requirement.

As a side note, I've been quite impressed by the amount of time and effort it takes to add a single optional property/column to the Resource Relation extension. I am starting to wonder whether analysis paralysis (https://en.wikipedia.org/wiki/Analysis_paralysis) or bike shedding (e.g., https://en.wiktionary.org/wiki/bikeshedding) are at play, or whether I am just being impatient. I expect it's probably the latter.

deepreef commented 3 years ago

However, I don't think this should prevent us from using identifiers that happen to look like URLs.

I agree completely! They certainly can serve as identifiers. The problems come in when things like this: http://somedomain.org/someidentifier changes to: https://somedomain.org/someidentifier

or when something like this: https://dx.doi.org/10.1234/5678 changes to: https://doi.org/10.1234/5678

I was mostly making sure that non-http URIs (non-URIs) would be allowable; in addition to the HTTP-URI identifiers.

jhpoelen commented 3 years ago

@deepreef no denying that common uri schemes can have some variation when humans get involved. Nothing that some regular expression can't fix ; ) (e.g., https://github.com/bio-guoda/preston-identifier-registry/blob/main/registry.tsv ). Also, subtitle differences between doi printed/url versions vs. vanilla dois can throw sand in integration engines if they are not aware of them (https://github.com/globalbioticinteractions/doi4j) . I imagine these variations occur even when non-URI are used. Would be a fun research projects to have a look at the occurrences of these identifier variations. Let me know if you are interested to collaborate on this.

deepreef commented 3 years ago

Agreed! It doesn't help that people (including some in our own community) treat these identifiers as if they were meant to be used by humans, rather than machines. DOIs are a rare example that seems to support both needs, but also with compromises on both. I'm a strong advocate for using identifiers optimized for machines when machines are talking to machines, and different identifiers optimized for humans when humans communicate with other humans, or humans and machines communicate with each other. The identifier "Homo sapiens" works really well for humans, and the identifier 58d31d52-713d-44b4-9fe9-cb2d9249c422 works really good for computers. Too often, people try to make the computer identifiers more friendly to humans, and that's where the problems usually happen.

tucotuco commented 3 years ago

As a side note, I've been quite impressed by the amount of time and effort it takes to add a single optional property/column to the Resource Relation extension. I am starting to wonder whether analysis paralysis (https://en.wikipedia.org/wiki/Analysis_paralysis) or bike shedding (e.g., https://en.wiktionary.org/wiki/bikeshedding) are at play, or whether I am just being impatient. I expect it's probably the latter.

@jhpoelen It is a combination of both exacerbated terribly by the availability of volunteer time. The past year has been particularly challenging in the latter respect, but I have managed to clear the table enough to try a concerted semi-annual effort to move along mature proposals, of which this one qualifies. My next task will be to consolidate and make final touches for the issues that are ready to move forward to public review (and those that do not require it because the changes are non-normative and non-breaking).

tucotuco commented 3 years ago

Here is an updated term change proposal following Darwin Core definition patterns and with additional information provided. @jhpoelen Please review and suggest any final changes before taking this to public review.

Proposed attributes of the new term:

Note: It is important to address Issue #194 at the same time to fix the unintended reversed relationship direction of the term relationshipOfResource. These two terms must be in agreement about the directionality of the relationship.

baskaufs commented 3 years ago

One brief comment on @deepreef's earlier comment about instability of HTTP IRIs: I agree with your frustration about the variation in https://orcid.org/... vs. http://orcid.org..., and dx.doi.org vs. doi.org . However, in both of those cases, the issuers have "gotten their act together" and there seems to be stable, consensus "best" forms for the IRIs: https://orcid.org/blah and https://doi.org/blah . So we will undoubtedly continue to encounter variants, but over time this will get better.

jhpoelen commented 3 years ago

@tucotuco Thank you for taking the effort to review the relationshipOfResourceID proposal and prepare the proposal for public comment. I realize that I might have been a little impatient, especially considering the immense task of maintaining DwC.

I just reviewed your revised proposal mentioned in https://github.com/tdwg/dwc/issues/283#issuecomment-817185410 and it looks good to me.

PS. Perhaps this tweak to the Resource Relation extension will usher in a new non-star schema era . . . make way for more flexibility in integrating data, and might very well provide a starting point for improved and explicit dependency management between collections. (e.g., collection A contains references to collection B).

deepreef commented 3 years ago

Thanks, @baskaufs : I wouldn't say that I'm "frustrated", necessarily -- just cognizant that identifiers work best when they are stable, and embedding dereferencing metadata within the identifier itself can pose challenges to maintaining that stability. That's why I decoupled identifiers from dereferencing mechanisms in BioGUID.

baskaufs commented 3 years ago

As noted in table 3.8 of the RDF guide, we dodged the issue of expressing ResourceRelationship data as RDF because we couldn't figure out the best way to do it. I think we could revisit that issue in the context of these new term additions/changes. But for now, this addition won't have an immediate effect on any dwcIri: analog.

tucotuco commented 3 years ago

Done.

jhpoelen commented 3 years ago

@tucotuco et al. Thanks for your hard work in adding the new Resource Relationship term relationshipOfResourceID . Maintaining a standard like DwC is no small feat!

jhpoelen commented 11 months ago

I am happy to report that the Resource Relations has been adopted by the Field Museum, and recently, by iNaturalist, see blog post -

Nov 3, 2023 Field Museum and iNaturalist Adopt Darwin Core Resource Relationship Standard to Share Species Interaction Records The Field Museum in Chicago and iNaturalist capture detailed records on how species interact. They both showed their capacity to innovate by using the recently improved Darwin Core Resource Relationship extensions to publish their interaction records. By using this standards based approach, they facilitate access to the valueable biodiversity knowledge they keep, and provide examples for others to follow. More ...