iobis / obis-issues

Repository for all OBIS related issues and feature requests
4 stars 3 forks source link

describe non Darwin Core fields added by OBIS #180

Open pieterprovoost opened 3 years ago

pieterprovoost commented 3 years ago

The OBIS data pipeline adds a number of non Darwin Core fields which are exposed through the API but not described in the manual.

BobSimons commented 3 years ago

Yes. Adding definitions for all of this fields would be great. For example, the occurrence.csv file has both datasetid and dataset_id fields. What are they? What's the difference?

pieterprovoost commented 3 years ago

There's a brief description here, but we are thinking about how we can include both the verbatim as well as the interpreted / additional fields and make it easier to distinguish between them.

@BobSimons There's the Darwin Core field datasetID which is defined here and there's dataset_id which is the OBIS internal UUID dataset identifier. But the export has all lowercase field names.

BobSimons commented 3 years ago

The brief descriptions that you point to are generally not as useful as they could be. For example, the definition of flags is "Quality flags added by OBIS", but there is no list of possible flag values, so how am I to infer what quality tests a given row has passed? And yes, dataset_id is defined as "Dataset identifier assigned by OBIS", but that leaves me to have to infer that the datasetID is probably from the source (which admit I could do, but shouldn't have to). A proper definition for each term (like the DWC definitions), in one place, with data type, units, and format requirements (if any), would be really nice.

I found figuring out how to parse the csv file to be challenging. I had to deduce many of the data types, units, formats (or variable format), and some of the definitions. I'm still making changes. Please make your data easier to use by providing a full description, in one place. That's my suggestion/request as a new user of your data.

Thank you.

BobSimons commented 3 years ago

I should give more detailed examples that need to be clarified in the definitions (of DWC and non-DWC terms):

Sorry. I veered into more specific questions at the end. But hopefully you get my point: surprising data values often bring up other questions which could be answered by a more complete definition. More complete definitions help users to not misuse the data. I know you are very concerned about users misusing/misinterpreting the data. As with most datasets, the people who created this dataset or work with it every day have long ago answered all their questions like these. But when you distribute data to the public (notably, researchers in related fields), it goes to people who aren't intimately familiar with the dataset, don't know the answers, and can't find the answers. Good documentation avoids problems, misunderstandings, misuse, and saves you tech support time.

Best wishes.

bart-v commented 2 years ago

Attempting to reply to all of the questions :) @pieterprovoost will correct me if I'm wrong

Q: What does it mean when occurrenceStatus is ""? Was there an occurrence? A: Yes, occurrenceStatus NULL just means the provider failed to complete the field, so the default (present) is assumed

Q: If minimumDepthInMeters is negative, does that specify an elevation above sea level? A: Probably not. The provider probably misinterpreted the definition of the field. In some databases depth implies a negative value.

Q: Is shoreDistance in meters? A: Yes, based on the first example at https://github.com/iobis/xylookup (51.2,2.90 is 2436m on land)

Q: If shoreDistance values are positive, does that mean the observation was over water / in the ocean? Q: If shoreDistance values are negative, does that mean the observation was inland by -1 times that distance? Q: Are bathymetry values in meters? A(3x): Correct

Q: What is the source of the bathymetry values? A: EMODnet Bathymetry and GEBCO, see https://github.com/iobis/xylookup (Data references)

Q: If bathymetry values are negative, is that the elevation of land above sea level? A: Correct

Q: What is the source of sst values? You have precise sst values from pre-satellite days!? Q: What is the source of sss values? You have precise sss values from pre-satellite days!? A(2x): Bio-Oracle, see https://github.com/iobis/xylookup (Data references)

Q: How can you have sst and sss values for rows where you don't have date_start, date_mid, or date_end data? A: Possibly by using the Darwin Core fields year/month/day or any of the other date fiels

Q: Why does startDayOfYear sometimes has values like 2155? Q: Why does year sometimes has values like 19889? Q: Why does month sometimes has values like -10? Q: Why does day sometimes has values like -88 or 56? A(4x): Input error by provider

Q: Why is scientificNameID sometimes blank when scientificName and aphiaID are known? A: The field was not populated by the provider. OBIS matched the scienctificName to WoRMS and added the AphiaID

Q: Why are there sometimes several values for scientificNameID (on different rows) when scientificName and aphiaID are constant? Stated another way, why do you sometimes point to out-of-date scientificNameID's when scientificName and aphiaID are up-to-date? A: Probably an incorrect scientificNameID and/or OBIS matched the scienctificName to WoRMS and added the (correct) AphiaID

Q: What does it mean when taxonomic status is unaccepted? Isn't the whole record then in question? A: Not at all. It just means it's a synonym name for a species. The Occurrence record is perfectly valid.

BobSimons commented 2 years ago

Thank you very much.

albenson-usgs commented 1 year ago

Related https://github.com/iobis/obis-issues/issues/148