Closed MathewBiddle closed 10 months ago
I believe the SanctSound data should be ready to go. I've adjusted the lat/lon pairs that were problematic and reran all my tests. Let me know when you're available to review.
This is looking good. I don't know how I missed this before but eventID
is missing from the occurrence file so there is no way to link the event measurements, which is all the measurements in the emof, to the corresponding event. You also don't need to repeat these in the emof for each occurrence. For example eventID "CI01_1_2018-11-01" there are three measurements (moored instrument depth, acoustic frequency, sound pressure level) but there are nine rows in the emof for this event because those measurements are being repeated for the bocaccio and humpback whale and plainfin midshipman occurrences.
So there should only be three rows in the emof and the occurrenceID should be blank for all three rows because none of these are measurements of the occurrence. Hopefully this makes sense. Happy to meet to walk through it.
I was also thinking we should add samplingProtocol
= "passive acoustic monitoring" unless Carrie has a sampling protocol we can reference. That way people at least know what kind of sampling this is if they are pulling all records for humpback whales for instance. This is not required so no need to do it if it's a pain. I was going to skip it but since the emof needs required fixes I figured it would be worth adding this as well.
Thanks for the quick check. I think I was saving the wrong dataframe. Let me take a look.
So there should only be three rows in the emof and the occurrenceID should be blank for all three rows because none of these are measurements of the occurrence. Hopefully this makes sense. Happy to meet to walk through it.
I was iterating through occurrenceID
not eventID
. I'll need to adjust my script to account for this.
Okay, the emof data will vary by occurrence, not event. The example you provided just happens to have the same values for all three occurrences:
eventDate | station | vernacularName | scientificNameID | scientificName | taxonRank | kingdom | decimalLatitude | decimalLongitude | freq_Hz | site | occurrenceID | climatology | depth | coordinateUncertaintyInMeters | fname | hydrophone_depth_m | _merge | countryCode | geodeticDatum | occurrenceStatus | basisOfRecord | eventID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2018-11-01 00:00:00 | 1 | bocaccio | urn:lsid:marinespecies.org:taxname:274833 | Sebastes paucispinis | Species | Animalia | 34.0438 | -120.081 | 300 | CI01 | siteCI01_station01_bocaccio_2018-11-01 | 10 | 20 | 8908 | SanctSound_CI01_propmodeling_SD0020m_SL170dB_FQ00300Hz_Oct_radarformat_highres.nc | 17.5 | both | US | WGS84 | present | MachineObservation | CI01_1_2018-11-01 |
1 | 2018-11-01 00:00:00 | 1 | humpback whale | urn:lsid:marinespecies.org:taxname:137092 | Megaptera novaeangliae | Species | Animalia | 34.0438 | -120.081 | 300 | CI01 | siteCI01_station01_humpback_whale_2018-11-01 | 10 | 20 | 8908 | SanctSound_CI01_propmodeling_SD0020m_SL170dB_FQ00300Hz_Oct_radarformat_highres.nc | 17.5 | both | US | WGS84 | present | MachineObservation | CI01_1_2018-11-01 |
2 | 2018-11-01 00:00:00 | 1 | plainfin midshipman | urn:lsid:marinespecies.org:taxname:275658 | Porichthys notatus | Species | Animalia | 34.0438 | -120.081 | 300 | CI01 | siteCI01_station01_plainfin_midshipman_2018-11-01 | 10 | 20 | 8908 | SanctSound_CI01_propmodeling_SD0020m_SL170dB_FQ00300Hz_Oct_radarformat_highres.nc | 17.5 | both | US | WGS84 | present | MachineObservation | CI01_1_2018-11-01 |
However, other events will have multiple values per event: | eventDate | station | vernacularName | scientificNameID | scientificName | taxonRank | kingdom | decimalLatitude | decimalLongitude | freq_Hz | site | occurrenceID | climatology | depth | coordinateUncertaintyInMeters | fname | hydrophone_depth_m | _merge | countryCode | geodeticDatum | occurrenceStatus | basisOfRecord | eventID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 2018-11-01 00:00:00 | 1 | blue whale | urn:lsid:marinespecies.org:taxname:137090 | Balaenoptera musculus | Species | Animalia | 34.0856 | -120.523 | 63 | CI02 | siteCI02_station01_blue_whale_2018-11-01 | 10 | 20 | 128344 | SanctSound_CI02_propmodeling_SD0020m_SL192dB_FQ00063Hz_Oct_radarformat_highres.nc | 73.5 | both | US | WGS84 | present | MachineObservation | CI02_1_2018-11-01 | |
4 | 2018-11-01 00:00:00 | 1 | bocaccio | urn:lsid:marinespecies.org:taxname:274833 | Sebastes paucispinis | Species | Animalia | 34.0856 | -120.523 | 300 | CI02 | siteCI02_station01_bocaccio_2018-11-01 | 10 | 20 | 15742 | SanctSound_CI02_propmodeling_SD0020m_SL170dB_FQ00300Hz_Oct_radarformat_highres.nc | 73.5 | both | US | WGS84 | present | MachineObservation | CI02_1_2018-11-01 |
I'm getting a little lost in this reorganization to emof and I can't figure out why it tripled up on your example. My hypothesis is that I wrote out each emof (eg. moored instrument depth) for each occurrence. I think that is still valid given the emof values will change based on species.
My hypothesis is that I wrote out each emof (eg. moored instrument depth) for each occurrence. I think that is still valid given the emof values will change based on species.
Ok that wasn't clear to me and is now so that makes sense (and is valid that there will occasionally be repeats). So these need to be linked to the occurrences and not the events. We still need to add eventID to the emof file because this will be an event core dataset (meaning eventID is the key between the tables). But no need to make any other changes.
Sounds good.
Just to be clear, we will keep the occurrenceID
in both occurrence and emof tables?
Yes!
Almost missed the piece about depth
. depth
"is approximately the depth of the observed species". Should that go into the occurrence record as minimumDepthInMeters and maximumDepthInMeters?
Also, we can definitely add measurementMethod
to each of the emof records from the resources at https://sanctsound.ioos.us/q_where-listen.html. I'll look for DOI references there.
I agree with the approach on samplingProtocol
. Again, I'll look around the sanctsound website and see if there is a DOI or something more permanent to reference to.
Yes that would be great to add depth into the occurrence file using minDIM and maxDIM! Thanks for making the connections to the sampling methods 🙌
Updated:
emof - https://github.com/MathewBiddle/bio_data_guide/blob/sanctsound/datasets/SanctSound/data/emof.zip occurrence - https://github.com/MathewBiddle/bio_data_guide/blob/sanctsound/datasets/SanctSound/data/occurrence_w_coordinateUncertainty.zip
Hopefully these address the above comments.
Everything looks great! All I need now is the metadata. Should I reach out to Carrie for that?
Yeah, Carrie would be the one for the metadata. We could point her to https://ioos.github.io/mbon-docs/metadata-eml.html
@albenson-usgs new occurrence file is at https://github.com/MathewBiddle/bio_data_guide/blob/sanctsound/datasets/SanctSound/data/occurrence_w_coordinateUncertainty.zip
I added a new notebook which walks through how the changes were made. See https://github.com/MathewBiddle/bio_data_guide/blob/sanctsound/datasets/SanctSound/04_data_fixes.ipynb
closes #147
I will leave this as a draft until we work through the issues in #147