iobis / gbif-marine

2 stars 1 forks source link

OBIS nodes holding non-marine and mixed datasets #3

Open wardappeltans opened 8 years ago

wardappeltans commented 8 years ago

Currently the iOBIS data harvester is configured in this way:

In order that non-marine datasets do not end up in iOBIS (but e.g. in GBIF), please flag datasets that you DO NOT want OBIS to harvest, by adding the text "Not marine, not harvested by iOBIS" into the EML: Additional Metadata > Additional Information metadata field for each IPT Dataset that's not for iOBIS.

In case of mixed (marine and non-marine species occurrences) datasets. Please, don't do anything. Those non-marine records will be filtered out by iOBIS (on the basis of WoRMS).

NicBailly commented 8 years ago

For the mixed, you mean we serve everything to GBIF and iOBIS, and iOBIS will filter out FW on the basis of WoRMS. So we do not need to filter on our side, correct?

wardappeltans commented 8 years ago

Hi Nicolas, that is correct. It is important though that the marine records have the WoRMS LSID in DwC term: scientificNameID.

OBISCanada commented 8 years ago

Will the procedure work if the LSID includes an ITIS TSN as opposed to the WoRMS AphiaID?

wardappeltans commented 8 years ago

SG-OBIS-III recommended the WoRMS LSID as the preferred one. In case of ITIS LSID (not TSN), iOBIS will need to map with WoRMS.

davewatts3 commented 8 years ago

Hi ward

Is there any particular reason for the double negative "Not marine, not harvested by iOBIS" versus "marine, harvested by iOBIS". I cant think of a case of the top of my head that will contain "marine, not harvested by iOBIS" or "Not marine, harvested by iOBIS" .

Just that I do prefer 'positive' versus 'negative' logic

Cheers Dave

wardappeltans commented 8 years ago

Dear Dave, we decided to go for "not marine, not (to be) harvested by iOBIS" . If it was "marine, (to be) harvest by iOBIS, then all OBIS nodes would have to score the >1900 datasets accordingly. iOBIS can now assume that if nothing is indicated that the dataset is a marine dataset and should be harvested by iOBIS.

NicBailly commented 8 years ago

Should we think to move to the positive logic when all datasets are reviewed from that point of view?

davewatts3 commented 8 years ago

I think we have 2 values that should be used - the explicit "marine, harvested by iOBIS" and then the implicit blank version (and replaced with the explicit value as and when reviewed by data providers).

Searching within EML metadata for 'iOBIS' will return non-harvested records if we use "Not marine, not harvested by iOBIS" and the end users will confused by the statement.

Using the positive statement means we can also add additional harvesting nodes to the 'iOBIS' value (e.g. ALA, SCAR-marBIN) without breaking any logic. End users then have explicit knowledge where the data went and they may use those nodes instead of iOBIS if these additional nodes provide some different tools/functionality.

Cheers dave

NicBailly commented 8 years ago

I am convinced! Nicolas.

wardappeltans commented 8 years ago

ALRIGHT. so we can accept the positive one. EML: Additional Metadata > Additional Information metadata field > marine, harvested by iOBIS.

MikeFlavell commented 8 years ago

Ward are you going to let all of the existing OBIS Tier II Nodes know about this essential new addition to the metadata of all their existing IPT based datasets? Only if I implement the harvesting code to filter datasets based on this metadata element, I would like to think the changes will be made by the Node Managers quickly so there will be still be something to harvest in late March/early April

**- As you probably will see - I sent out a mail to all of the Nodes and SG, informing them of this, and they need to action the metadata changes by 25 March 2016.**

OBISCanada commented 8 years ago

is there a way to update all existing metadata on an IPT to add the required phrases for resources that currently pass OBIS harvesting tests? does an R script exist that can revise metadata for a given resource?

MikeFlavell commented 8 years ago

I have checked the online IPT documentation available at [https://github.com/gbif/ipt/wiki/IPT2ManualNotes.wiki] as far as I can see there is no mention of any way to perform mass updates of metadata for all datasets on an IPT, neither can I see any way to set a default value for a metadata field. So the only option if you are manually creating and maintaining resources on your IPT seems to be to add the required phrase to the metadata manually for each resource.

I don't know of any R-script that will do what you ask, but maybe someone else does?

OBISCanada commented 8 years ago

to confirm that i have the correct method - for my OBIS resources i enter 'marine, harvested by iOBIS'.

recently i started collaborating with data providers who have datasets on GBIF IPTs. if there datasets are 'marine' then i will use the same phrase as above and i will provide OBIS with the location of the resource.

my main question is related to museum collections. Today i visited the Royal Ontario Museum and they have many collections that are based on taxon groups. They have fishes, and invertebrates and birds, etc. What phrase should i enter to indicate that these resources are to be harvested by OBIS but that contain a mixture of marine and freshwater and other environments?

OBISCanada commented 8 years ago

I should clarify that OBIS Canada will not transfer resources from the GBIF IPT to our IPT but rather point iOBIS to the location to harvest the GBIF content directly. It will be the GBIF data provider who will revise their metadata to include the required phrase.

davewatts3 commented 8 years ago

Hi OBIS Canada

I had the same problems many years ago with a dataset covering marine and non-marine. To submit to OBIS, I created two resources – one marine and one non-marine so OBIS could harvest what they wanted. This was before WoRMS was a going concern.

Hopefully matching to WoRMS will filter non-marine records.

Cheers Dave

OBISCanada commented 8 years ago

Asking the GBIF data provider to create two resources just won't happen but they are willing to work with me and they will add a LSID to the scientificNameID field so this will help iOBIS filter out the records. My question is what should be entered in the metadata on a GBIF IPT to indicate 'mixed' and that they do wish to be harvested by iOBIS.

davewatts3 commented 8 years ago

Hi again

Just the ‘'marine, harvested by iOBIS' will suffice for OBIS

GBIF will harvest anything so won’t care with that phrase being there.

Adding the LSID is the best step in any case and as mentioned at the workshop, will help Bart et al. in the data processing tasks.

Cheers dave

OBISCanada commented 8 years ago

Thanks Dave for your comments. I just thought that having a phrase 'marine' might look silly on a museum collection that probably is 90% freshwater. We are trying to get regular users to actually read the metadata and wasn't sure what the reaction would be to this entry as the metadata box doesn't imply that it is a harvesting instruction of interest only to iOBIS. I understand that this is the decision re how to treat marine datasets - just want to confirm that the same phrase is to be used for mixed datasets.

davewatts3 commented 8 years ago

Hi Mary

Yea you can put others words there to reflect the content that will dutifully ignored by machines.

The spot it goes into is right at the end of the metadata in additional info so it should not distract users.

Cheers dave