Open dshorthouse opened 6 years ago
Interesting idea. We use GitHub in VertNet to accept feedback at the record level, but there is no single record-level URL for issue submission, as there are parameters that vary with each issue. One key aspect of the way we are doing it, and that catches my attention with the new proposed term, is that the issue repository (the basis of a URL) is at the data set level. This could be a relevant consideration for NCD.
GUIDE TO GITHUB REFERENCE & SET UP FOR DATA ISSUE TRACKING http://vertnet.org/resources/issuetrackingguide.html
On Thu, Feb 8, 2018 at 5:33 PM, David Shorthouse notifications@github.com wrote:
Proposing new term. Use-case is to have URL presented on GBIF and other portals to direct end-users where to submit issues about occurrence record. If these URLs pointed to new GitHub issues, the URLs could contain parameters that act to pre-populate fields with eg catalogNumber
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/180, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP69NOJHpH5T4eufOxPzF6Ee4O3CAJks5tS1oTgaJpZM4R--6b .
Interesting indeed. We currently provide this link in the description of the dataset metadata:
Issues with the dataset can be reported at https://github.com/inbo/data-publication/tree/master/datasets/bird-tracking-gull-occurrences
Yep, many of us do this in one form or another. It occurred to me that it would wise if aggregators could make use of something offered by the data provider such that issues raised about the occurrence record: (1) reside closer to the source, (2) identical for all aggregators of the record
Exactly. That's why we set up the orgs and repos for the data publishers, so they are "theirs" to recieve feedback from wherever.
On Feb 10, 2018 12:58 PM, "David Shorthouse" notifications@github.com wrote:
Yep, many of us do this in one form or another. It occurred to me that it would wise if aggregators could make use of something offered by the data provider such that issues raised about the occurrence record: (1) reside closer to the source, (2) identical for all aggregators of the record
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/180#issuecomment-364665113, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP64N4jAxNdW4N9oD9Fs5Xgrdek90Dks5tTbyXgaJpZM4R--6b .
It's the "from wherever" I'd like to figure how to solve. I expect there are providers to VertNet that also have their data appear on GBIF. The latter (+other aggregators) has no effective way to direct end-users to a repo that was established to gather feedback - it's not immediately knowable.
Yep, I agree, from wherever. Yes, 100% of VertNet participants are in GBIF by design. And functionally this is needed at the level of each record, but is it constant at the level of a data set? I think it should be. So to me the question is where the term should live. DwC or NCD?
Also, we use the same feedback mechanism to supply usage summaries. The data publishers seem to like and appreciate that. The topic is timely, I think, for the BCoN Data integration and attribution workshop this week.
More on how the whole thing works can be found in the webinar at https://www.idigbio.org/content/webinar-data-quality-usage-and-issue-tracking-using-github.
So to me the question is where the term should live. DwC or NCD?
Both? There's variable handling of eml vs DwC terms by aggregators. Some do it well, some not so well. Often when records are downloaded from aggregations, the relationship between the former and the latter tends to get lost, as though the record was made available by an anonymous and generous provider. My vote is for inclusion of a feedbackURL in DwC for this very reason. Plus, of course, some instructions for what makes a useful, record-level feedbackURL.
@dshorthouse
This request involves the creation of a new term, thus, in accordance with the Vocabulary Maintenance Standard (VMS) Section 3 it requires a full review, a public commentary, ratification if warranted, and then implementation.
We are at the stage in the process where we have to demonstrate that the three main requirements for moving forward a change request are met. From the VMS Section 3.1:
Because the primary purpose of TDWG vocabularies is to facilitate data sharing, it is necessary to show that multiple parties will benefit from the change. As such, it is a minimum requirement that two independent entities indicate that they desire the change (the demand requirement). Additionally, it is required that there is a consensus within the community that the proposed change will accomplish the desired outcome (the efficacy requirement), and that making the change will not adversely affect the interoperability of existing implementations that depend on the stability of the vocabulary (the stability requirement).
Before the proposal goes to public review, the demand requirement has to be demonstrated. It is the burden of those proposing to make this case. It has not been made explicit so far. In order to facilitate assessment, it is essential that the complete proposal be made explicit as well. Have a look at the the Guidelines for contributing and the following issues as examples of how to do that:
With the two prerequisites described above satisfied, the proposal can move forward. The Darwin Core Maintenance Interest Group has its open annual review meeting on 2020-09-23T14:30+00:00, during which we will attempt to make progress on as many open issues as possible. The more mature a justification is, the further along we can move a proposal.
@tucotuco I appreciate the formality and the structure, but this was an idea on a whim, borne out of frustration at seeing re-use of occurrence data in multiple resources w/o regard to synchrony & provenance as though occurrence data are static. I cannot follow this up & yet I know it deserves attention. I would immediately use it in https://bionomia.net because users here uncover all sorts of errors & inconsistencies but are dismayed by the dead ends in communication.
This reminds me of the Annotation Ontology — and Paul Morris’ comment — that users can share their expectations (e.g. as a result of my comment - I expect nothing; or I expect a reply; or I expect notification when changed — etc). We still have yet to implement a robust #roundtripping scenario (at all) and that scales. Deb
Sent from Shoe (my iPhone)
On Sep 6, 2020, at 8:58 PM, David Shorthouse notifications@github.com wrote:
@tucotucohttps://urldefense.com/v3/__https://github.com/tucotuco__;!!PhOWcWs!hcluKiv7PiGTSu_UPaE-FUIZORFyr5Mtf8Rr_5_iosGAjwBuOb677aC9vYuucg$ I appreciate the formality and the structure, but this was an idea on a whim, borne out of frustration at seeing re-use of occurrence data in multiple resources w/o regard to synchrony & provenance as though occurrence data are static. I cannot follow this up & yet I know it deserves attention. I would use immediately use it in bionomia.net because users here uncover all sorts of errors & inconsistencies but are dismayed by the dead ends in communication.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/tdwg/dwc/issues/180*issuecomment-687956748__;Iw!!PhOWcWs!hcluKiv7PiGTSu_UPaE-FUIZORFyr5Mtf8Rr_5_iosGAjwBuOb677aC1SsiKQQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAW2AS337TGCALOSWQT5VALSEQV3XANCNFSM4EP352NQ__;!!PhOWcWs!hcluKiv7PiGTSu_UPaE-FUIZORFyr5Mtf8Rr_5_iosGAjwBuOb677aCC4AHLKg$.
Closing for lack of evidence of demand.
A better way to word the justification for closure is lack of a champion who has the time and energy to carry it through the TDWG process. GBIF have custom crafted feedbackURLs on their occurrence pages in response to requests from Plazi and iNaturalist. Evidence of demand? Clearly. Evidence of demand in DarwinCore? Not so much.
A better way to word the justification for closure is lack of a champion who has the time and energy to carry it through the TDWG process. GBIF have custom crafted feedbackURLs on their occurrence pages in response to requests from Plazi and iNaturalist. Evidence of demand? Clearly. Evidence of demand in DarwinCore? Not so much.
+1 @dshorthouse I wonder ... do "we" need to send out a poll ... and to whom ... and what does it say? I think the need is clear. I'm not entirely sure all those who need it even know what to ask for. With every Bionomia based event we've been part of ... many of the participants ask us ... How do we get this information "back" (i.e. assertion of data issues) to the people that need it ... that we see an issue? This would be one step forward toward more agency for everyone.
Comments please? @qgroom @ekrimmel @CatChapman @tmcelrath @mjy
Example (like you gave, @tucotuco): at TaxonWorks too, folks can have issues that for their public-facing TaxonPages ... go directly to their specific github repository. It would be simple to put that github repo URL in said field being described here. This would mean more issues go back to the source, and less work for GBIF to filter, for example.
Is there a need for this URL at occurrence/taxon level, or can it be indicated at dataset metadata level and be pulled down to a taxon/occurrence page from there?
Is there a need for this URL at occurrence/taxon level, or can it be indicated at dataset metadata level and be pulled down to a taxon/occurrence page from there?
I guess it depends if the feedbackURL is itself a template whose parameter(s) are meant to be substituted with values at the level of occurrence or taxon in order to preserve context for the sender and receiver. Substitution rules for those parameter(s) would likewise need to be indicated, otherwise a naive rendering with no additional scripting would produce poorly constructed messages. If there is no parameter substitution in this way - the feedbackURL is identical for all records in any one dataset - then pulling down from the dataset level makes this no more useful than doing the same with an email address(es), assuming this too is present.
Are there sufficient identifying features in say, a GBIF tsv download, for someone to communicate feedback to the publisher of a particular occurrence/taxon record with the least amount of friction for all parties, including GBIF?
This would be very useful for the INHS Insect Collection and our TaxonWorks database.
Is there a need for this URL at occurrence/taxon level, or can it be indicated at dataset metadata level and be pulled down to a taxon/occurrence page from there?
Some comments will be specific to an occurrence in a download. It would make sense to make that as easy as possible. Some data providers may not be able to provide that level of detail but could use a metadata level comment url which in any case should still appear on the individual occurrences. Right?
At the TWT discussion just now @dshorthouse hinted at possible extensions. This got me thinking that we could explore a metadata element that lets providers point specific fields/terms to a URL where the provider felt issues with that field could be resolved, perhaps using a very simple occurrenceID
referencing DSL (in the example below the aggregator would replace $1
with the occurrence ID of the provider
actions: {
verbatimLocality: 'https://sfg.taxonworks.org/tasks/comprehensive/occurrence_id=$1',
...
}
This means, if a problem regarding verbatimLocality is discovered, then TW curators would like to fix it at the provided URL, like https://sfg.taxonworks.org/tasks/comprehensive/occurrence_id=9809d201-8a28-4ed7-8ccc-3b81b39d999
Is there a need for this URL at occurrence/taxon level, or can it be indicated at dataset metadata level and be pulled down to a taxon/occurrence page from there?
I would think in the vast majority a dataset level link is sufficient and simpler to handle too. It is just when you start to merge records from different datasets in e.g. a gbif download. But even then you usually have all dataset metadata at hand and records are linked to that dataset metadata.
Using URL templates with variables, especially the ID, is also often useful. A single website link template in metadata that just takes a record ID can replace link columns in data and be simpler to change over time. From my experience I'd therefore prefer a metadata element over a data record one. But one persons metadata is another ones data...
But even then you usually have all dataset metadata at hand and records are linked to that dataset metadata.
While that is somewhat true, the ultimate aim with this term is to facilitate transparent communication between submitter and someone who can take action on a particular issue, regardless of the origin of that record or its intermediary presentation. It's wonderful that GBIF maintains a registry in which the info. in a DwC-A's EML is parsed and held near a record, but not all intermediary aggregations are able to maintain that active link and nor, unless I am mistaken, does the EML have a home for such a term.
@tucotuco please, have we now met evidence for demand?
Yes, for sure.
Submitter: David P. Shorthouse Proponents (at least two independent parties who need this term): Tommy McElrath, Alina Freire-Fierro, Nicky Nicolson, Deborah Paul, Camila Plata, Phillip Hogan, Hester Steyn, Katie Pearson, Sangmi Lee, Samanta Orellana, Jordi Agulló, Geoff Ower Justification (why is this term necessary?): The case for adding the term dwc:feedbackURL was justified at TaxonWorks Together, https://together.taxonworks.org/#Schedule Proposed definition of the new term: A uniform resource locator (URL) that points to a webpage on which a form may be submitted to gather feedback about the record. Term name (in lowerCamelCase): feedbackURL Class (e.g. Location, Taxon): Record-level Comment (examples, recommendations regarding content, etc.): Recommended best practice is to optionally include query strings that act to pre-populate web page form elements and communicate the context. Examples: https://example.com/new?title=New+issue&body=This+comment+is+about+CAN12345 Refines (identifier of the broader term this term refines, if applicable): Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable): ABCD 2.06 (XPATH of the equivalent term in ABCD, if applicable):
Original submission:
Proposing new term. Use-case is to have URL presented on GBIF and other portals to direct end-users where to submit issues about occurrence record. If these URLs pointed to new GitHub issues, the URLs could contain parameters that act to pre-populate fields with eg catalogNumber