tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
205 stars 70 forks source link

What changes need to be made to the notes on dwc:occurrenceStatus? #238

Closed baskaufs closed 3 years ago

baskaufs commented 4 years ago

@qgroom I was reading the section on dwc:occurrenceStatus in the https://doi.org/10.3897/biss.3.38084 paper and it noted "We propose adding notes to the documentation of dwc:occurenceStatus, to point users to other status fields that might be appropriate for their needs.". However, the paper didn't suggest what the text should be. Suggestions?

Since the notes aren't normative, we won't necessarily need to go through the fulll change process. Here's what the Vocabulary Maintenance Specification says about this kind of situation:

Because non-normative content provides only supplemental information, the Interest Group may use its discretion to decide the extent to which the community should be involved in implementing changes to non-normative content. For example, relatively cosmetic changes, such as improving figures, changing formatting, minor improvements to examples, etc. can be made without triggering any change process or notification via the TDWG email list [TDWG-CONTENT]. More significant changes or improvements to non-normative content may warrant notification of the community via the TDWG email list [TDWG-CONTENT]. If the Interest Group determines that proposed changes to non-normative content are significant enough, it may chose to invoke the full change process. Substantive changes to non-normative content will usually trigger a version change for the affected document.

My guess is that since we aren't really telling people how to use the term differently (just how to use it correctly) an appropriate course of action would be to add the clarifying notes, then inform the community via tdwg-content. Probably the change process (public comment, executive review, etc.) would not be necessarily, but that would depend somewhat on exactly what the new comments say.

qgroom commented 4 years ago

@baskaufs Below are the clarifications I would like to make about dwc:occurrenceStatus

baskaufs commented 4 years ago

Thanks for this @qgroom. I'm assuming you meant to say:

"Temporal boundaries are perhaps best provided by eventDate. ISO 8601 supports date ranges." rather than "Spatial boundaries..."

qgroom commented 4 years ago

I've fixed it now

albenson-usgs commented 4 years ago

I disagree with this statement "Therefore, absence has no meaning for point observations with an coordinateUncertaintyInMeters." Modelers need information about when researchers look for a species and don't find it. This can and does happen at the point observation level. The species may not be absent from an entire waterBody, stateProvince, etc but it might be "absent" (not detected) at that very specific location and this is important to know. Take for instance a coral reef monitoring program that is looking for staghorn coral (or at least the methodology they are using would detect staghorn coral at locations where they are looking)- they use the point line intercept method for their survey- they detect staghorn coral at 5 points along the transect but not the other 5. Including all ten point observations, and especially the ones where staghorn coral are absent is critical to document and share.

qgroom commented 4 years ago

So what does an absence of a point observation mean? Could it be present 1m away. Is it absent within the dwc:coordinateUncertaintyInMeters. Is is absent at that moment in time, and could it be present the day before or the day after? A point observation is just a moment in time and space. It can't be used to predict absence more generally, which would be useful for modelling.

In the example you give the absences and presences are useful for estimating an abundance, but the whole survey has boundaries. If staghorn coral is absent from every point in the transect then that doesn't mean that coral is absent more extensively, you only know it is less abundant than the sensitivity of the method. Yes, it is critical to document this information, but it only makes sense in the context of a bounded survey and your transect is such a bounded survey.

albenson-usgs commented 4 years ago

I have not conducted species distribution modeling myself but my understanding is that when you do so if you are using presence only data then you select pseudo-absences (also points) randomly throughout the area where the species was not seen. It seems to me it would be better to use points of where a species could have been detected but was not seen as a better predictor of species distribution than using pseudo-absences. But modelers can only do this if non-detections are reported. We of course never have perfect detection of species and have to make educated guesses for their distribution. I still posit that having documented point locations for non-detections is better than not including that information when we have it. Yes, the methodology needs to be documented extremely well also. But you can still have a point, in time and space, where you did not see a species if your methodology could have detected it.

robgur commented 4 years ago

This gets firmly into some SDM theory and what exactly you wish to model - realized distribution or potential ones e.g. more akin to habitat modeling. My view is that absences (or better framed as non-detections) are better inferred from list of species, knowledge of the regional species pool, and sampling methods than directly reported. I would like to see enough richness in metadata about sampling events to do this sort of inference and perhaps the event core gets us nearly there (and perhaps it doesn't). There is still low hanging fruit here to build this ecosystem in a strategic way and appreciate Abby's efforts here. Best, Rob

On Thu, Dec 26, 2019, 10:22 AM Abby Benson notifications@github.com wrote:

I have not conducted species distribution modeling myself but my understanding is that when you do so if you are using presence only data then you select pseudo-absences (also points) randomly throughout the area where the species was not seen. It seems to me it would be better to use points of where a species could have been detected but was not seen as a better predictor of species distribution than using pseudo-absences. But modelers can only do this if non-detections are reported. We of course never have perfect detection of species and have to make educated guesses for their distribution. I still posit that having documented point locations for non-detections is better than not including that information when we have it. Yes, the methodology needs to be documented extremely well also. But you can still have a point, in time and space, where you did not see a species if your methodology could have detected it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/238?email_source=notifications&email_token=AADRZ3H4UM4H7NESZYWFYJDQ2TY6XA5CNFSM4JC3R3CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHV6OOY#issuecomment-569108283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADRZ3EEIX3QQPOVOUKN44DQ2TY6XANCNFSM4JC3R3CA .

tucotuco commented 4 years ago

Am I missing something, or is the statement actually supposed to be, "Therefore, absence has no meaning for point observations without a coordinateUncertaintyInMeters"?

qgroom commented 4 years ago

Am I missing something, or is the statement actually supposed to be, "Therefore, absence has no meaning for point observations without a coordinateUncertaintyInMeters"?

No, the coordinateUncertaintyInMeters do not transcribe an area that was searched. It doesn't delimit the boundaries of an observation, it delimits the uncertainty of the coordinates associated with a point observation. The organism was presumably observed somewhere in that circle, but you don't know where and you don't know where the observer was looking for that organism.

qgroom commented 4 years ago

BTW: It is worth noting that a "point" such as this 50°49'41"N 4°34'43"E on the earth's surface actually describes a quadrangle with a width of about 26 m. So because this describes only the southwest corner of a quadrangle the actual location of the organism can be beyond the coordinateUncertaintyInMeters from this corner. If the coordinate uncertainty is large and the precision is small this has little consequence, but this is not always the cases.

ArthurChapman commented 4 years ago

You are correct in saying that "I didn't see this at this point" but that point does still have an uncertainty. A point is NEVER just a point. Every point has an uncertainty and even often an extent associated with it (be that it may be very small). So in reality you are not saying that it doesn't occur in the totality of the area covered by the coordinateUncertaintyInMeters, but what you are saying is that "I didn't detect the species at this point, however that point could be anywhere in the area covered by coordinateUncertaintyInMeters". Your coordinateUncertaintyInMeters may be very small if you are using a Differential GPS, or using PPP methodology, etc., but it all depends on how accurate/uncertain is the point you are recording

ArthurChapman commented 4 years ago

I agree that recording "absences" does require an area component (and a time component). But in reality, absences may be recorded/noted using any one of a number of methodologies, and the methodology used should also be recorded. Transect, a shape around a transect. Then others have used methods whereby they have been searching and recording presences for a species, and have noted that there were no other species of that genus in the area where they collected, summising that as they are an expert in that genus, they would have noticed and noted if there were other species in that genus present. I remember a paper by Winston Ponder on this subject many years ago.

tucotuco commented 4 years ago

I see Arthur got in a couple of responses before I could finish, but I'll offer this up anyway.

Where to begin. Lots of problems here. This isn't the venue for a georeferencing course, but it is extremely important that these misconceptions not be propagated. Foremost is the misconception of what coordinateUncertaintyInMeters means. It sounds like you are half mixing coordinateUncertaintyInMeters with coordinatePrecision. The statement "it delimits the uncertainty of the coordinates associated with a point observation" is definitively wrong for coordinateUncertaintyInMeters, as is "the actual location of the organism can be beyond the coordinateUncertaintyInMeters from this corner". The definition of coordinateUnccertaintyInMeters is:

"The horizontal distance (in meters) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the Location. Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates). Zero is not a valid value for this term."

Location is the place where an Occurrence, or negative Occurrence happened. It is the place where an Occurrence was sought (or found). In that sense, it is the same for a presence or an absence - either "this is where it was found" or "this is where it was not found". The coordinateUncertaintyInMeters is just a scalar the say how big the place is. Ideally the Location would described by a shape in footprintWKT with footprintSRS, but a point-radius (where the point is the combination of decimalLatitude, decimalLongitude, and geodeticDatum, and the radius is the coordinateUncertaintyInMeters) version might also be included, or included instead if the footprint isn't available.

On Thu, Dec 26, 2019 at 6:45 PM Quentin Groom notifications@github.com wrote:

BTW: It is worth noting that a "point" such as this 50°49'41"N 4°34'43"E on the earth's surface actually describes a quadrangle with a width of about 26 m. So because this describes only the southwest corner of a quadrangle the actual location of the organism can be beyond the coordinateUncertaintyInMeters from this corner. If the coordinate uncertainty is large and the precision is small this has little consequence, but this is not always the cases.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/238?email_source=notifications&email_token=AADQ7273R2MWWQ47VBT5EMTQ2UQWVA5CNFSM4JC3R3CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHWFZWY#issuecomment-569138395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ7252HJXCUDCSGOU62PTQ2UQWVANCNFSM4JC3R3CA .

Tasilee commented 4 years ago

I agree with @tucotuco. One issue that has not been not been stated explicitly is the discoverability of true absences (with some form of spatial extent for the sites) only after a suite of survey sites have been ‘evaluated’. As in something like “I recorded all tree species in a series of 10m plots positioned randomly across an ecosystem, and after evaluation, species present in some sites were noted as absent in others.”

Also note that some analytical methods (e.g., some SDM’s) value TRUE (observed) absences over pseudo absences. An SDM like MaxEnt will only deliver true probabilities of occurrence with observed absences.

qgroom commented 4 years ago

I think we all agree that to describe an absence you need clearly defined boundaries and preferably a explicit methodology, rather than one inferred by the observation coordinates.

tucotuco commented 4 years ago

Excellent. I stand satisfied. :-)

On Fri, Dec 27, 2019 at 4:34 AM Quentin Groom notifications@github.com wrote:

I think we all agree that to describe an absence you need clearly defined boundaries and preferably a explicit methodology, rather than one inferred by the observation coordinates.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/238?email_source=notifications&email_token=AADQ723ABHWR3A4UGYQH4QDQ2WVYLA5CNFSM4JC3R3CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHWXVUA#issuecomment-569211600, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72ZLVTLYU4UCV24UQ7LQ2WVYLANCNFSM4JC3R3CA .

albenson-usgs commented 4 years ago

Ok question (not sure where else would be better to pose this question so apologies if this isn't the right place). I have a dataset (https://www.gbif.org/dataset/f56fb306-32e4-4b96-a381-6b87c186ad0f). It uses a stationary point count method for assessing reef fish (https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/fee.2144?campaign=wolearlyview). There are no absence records associated with this dataset as it's currently published. However, there is one event where no fish were seen. As it stands now this is documented as an event with no occurrences but I believe in effect this information will be lost to the data users. What would the recommendation be for how best to represent this information to an end user in GBIF?

tucotuco commented 4 years ago

Interesting one. Mind if we move that one to the Darwin Core Questions and Answers site? I can either copy it over from here, or you could enter it via the form at http://bit.ly/dwcqaform.

On Wed, Jan 8, 2020 at 12:43 PM Abby Benson notifications@github.com wrote:

Ok question (not sure where else would be better to pose this question so apologies if this isn't the right place). I have a dataset ( https://www.gbif.org/dataset/f56fb306-32e4-4b96-a381-6b87c186ad0f). It uses a stationary point count method for assessing reef fish ( https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/fee.2144?campaign=wolearlyview). There are no absence records associated with this dataset as it's currently published. However, there is one event where no fish were seen. As it stands now this is documented as an event with no occurrences but I believe in effect this information will be lost to the data users. What would the recommendation be for how best to represent this information to an end user in GBIF?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/238?email_source=notifications&email_token=AADQ72Z2GXEWXKSSTGUENYTQ4XYCRA5CNFSM4JC3R3CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIM7UBI#issuecomment-572127749, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ725TX4XAUUZ4IHBY53TQ4XYCRANCNFSM4JC3R3CA .

albenson-usgs commented 4 years ago

I'll put it over there. Sorry for the tardy response. Just discovered a bunch of Github notifications going to my spam folder in my old email system O_O

tucotuco commented 3 years ago

Ok question (not sure where else would be better to pose this question so apologies if this isn't the right place). I have a dataset (https://www.gbif.org/dataset/f56fb306-32e4-4b96-a381-6b87c186ad0f). It uses a stationary point count method for assessing reef fish (https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/fee.2144?campaign=wolearlyview). There are no absence records associated with this dataset as it's currently published. However, there is one event where no fish were seen. As it stands now this is documented as an event with no occurrences but I believe in effect this information will be lost to the data users. What would the recommendation be for how best to represent this information to an end user in GBIF?

Since this question doesn't seem to have surfaced anywhere else, I'll offer the following, especially following the recommended clarifications for individualCount and organismQuantity/organismQuantityType. I would generate one or more Occurrence records for the Event (as many as needed to capture the scope of the taxonomic target of observation) in which the individualCount is 0, the organismQuantity is 0, the organismQuantityType is "individuals", and the occurrenceStatus is "absent".

albenson-usgs commented 3 years ago

I did add it to the DwC Q&A: https://github.com/tdwg/dwc-qa/issues/151

tucotuco commented 3 years ago

The usage notes recommended in this issue were added to term change proposal Issue #339. Closing this issue.