ioos / bio_data_guide

Standardizing Marine Biological Data Working Group - An open community to facilitate the mobilization of biological data to OBIS.
https://ioos.github.io/bio_data_guide/
MIT License
45 stars 21 forks source link

working on SanctSound processing #184

Closed MathewBiddle closed 10 months ago

MathewBiddle commented 1 year ago

I will leave this as a draft until we work through the issues in #147

MathewBiddle commented 11 months ago

I believe the SanctSound data should be ready to go. I've adjusted the lat/lon pairs that were problematic and reran all my tests. Let me know when you're available to review.

albenson-usgs commented 11 months ago

This is looking good. I don't know how I missed this before but eventID is missing from the occurrence file so there is no way to link the event measurements, which is all the measurements in the emof, to the corresponding event. You also don't need to repeat these in the emof for each occurrence. For example eventID "CI01_1_2018-11-01" there are three measurements (moored instrument depth, acoustic frequency, sound pressure level) but there are nine rows in the emof for this event because those measurements are being repeated for the bocaccio and humpback whale and plainfin midshipman occurrences. Capture So there should only be three rows in the emof and the occurrenceID should be blank for all three rows because none of these are measurements of the occurrence. Hopefully this makes sense. Happy to meet to walk through it.

I was also thinking we should add samplingProtocol = "passive acoustic monitoring" unless Carrie has a sampling protocol we can reference. That way people at least know what kind of sampling this is if they are pulling all records for humpback whales for instance. This is not required so no need to do it if it's a pain. I was going to skip it but since the emof needs required fixes I figured it would be worth adding this as well.

MathewBiddle commented 11 months ago

Thanks for the quick check. I think I was saving the wrong dataframe. Let me take a look.

So there should only be three rows in the emof and the occurrenceID should be blank for all three rows because none of these are measurements of the occurrence. Hopefully this makes sense. Happy to meet to walk through it.

I was iterating through occurrenceID not eventID. I'll need to adjust my script to account for this.

MathewBiddle commented 11 months ago

Okay, the emof data will vary by occurrence, not event. The example you provided just happens to have the same values for all three occurrences:

eventDate station vernacularName scientificNameID scientificName taxonRank kingdom decimalLatitude decimalLongitude freq_Hz site occurrenceID climatology depth coordinateUncertaintyInMeters fname hydrophone_depth_m _merge countryCode geodeticDatum occurrenceStatus basisOfRecord eventID
0 2018-11-01 00:00:00 1 bocaccio urn:lsid:marinespecies.org:taxname:274833 Sebastes paucispinis Species Animalia 34.0438 -120.081 300 CI01 siteCI01_station01_bocaccio_2018-11-01 10 20 8908 SanctSound_CI01_propmodeling_SD0020m_SL170dB_FQ00300Hz_Oct_radarformat_highres.nc 17.5 both US WGS84 present MachineObservation CI01_1_2018-11-01
1 2018-11-01 00:00:00 1 humpback whale urn:lsid:marinespecies.org:taxname:137092 Megaptera novaeangliae Species Animalia 34.0438 -120.081 300 CI01 siteCI01_station01_humpback_whale_2018-11-01 10 20 8908 SanctSound_CI01_propmodeling_SD0020m_SL170dB_FQ00300Hz_Oct_radarformat_highres.nc 17.5 both US WGS84 present MachineObservation CI01_1_2018-11-01
2 2018-11-01 00:00:00 1 plainfin midshipman urn:lsid:marinespecies.org:taxname:275658 Porichthys notatus Species Animalia 34.0438 -120.081 300 CI01 siteCI01_station01_plainfin_midshipman_2018-11-01 10 20 8908 SanctSound_CI01_propmodeling_SD0020m_SL170dB_FQ00300Hz_Oct_radarformat_highres.nc 17.5 both US WGS84 present MachineObservation CI01_1_2018-11-01
However, other events will have multiple values per event: eventDate station vernacularName scientificNameID scientificName taxonRank kingdom decimalLatitude decimalLongitude freq_Hz site occurrenceID climatology depth coordinateUncertaintyInMeters fname hydrophone_depth_m _merge countryCode geodeticDatum occurrenceStatus basisOfRecord eventID
3 2018-11-01 00:00:00 1 blue whale urn:lsid:marinespecies.org:taxname:137090 Balaenoptera musculus Species Animalia 34.0856 -120.523 63 CI02 siteCI02_station01_blue_whale_2018-11-01 10 20 128344 SanctSound_CI02_propmodeling_SD0020m_SL192dB_FQ00063Hz_Oct_radarformat_highres.nc 73.5 both US WGS84 present MachineObservation CI02_1_2018-11-01
4 2018-11-01 00:00:00 1 bocaccio urn:lsid:marinespecies.org:taxname:274833 Sebastes paucispinis Species Animalia 34.0856 -120.523 300 CI02 siteCI02_station01_bocaccio_2018-11-01 10 20 15742 SanctSound_CI02_propmodeling_SD0020m_SL170dB_FQ00300Hz_Oct_radarformat_highres.nc 73.5 both US WGS84 present MachineObservation CI02_1_2018-11-01

I'm getting a little lost in this reorganization to emof and I can't figure out why it tripled up on your example. My hypothesis is that I wrote out each emof (eg. moored instrument depth) for each occurrence. I think that is still valid given the emof values will change based on species.

albenson-usgs commented 11 months ago

My hypothesis is that I wrote out each emof (eg. moored instrument depth) for each occurrence. I think that is still valid given the emof values will change based on species.

Ok that wasn't clear to me and is now so that makes sense (and is valid that there will occasionally be repeats). So these need to be linked to the occurrences and not the events. We still need to add eventID to the emof file because this will be an event core dataset (meaning eventID is the key between the tables). But no need to make any other changes.

MathewBiddle commented 11 months ago

Sounds good.

Just to be clear, we will keep the occurrenceID in both occurrence and emof tables?

albenson-usgs commented 11 months ago

Yes!

MathewBiddle commented 11 months ago

Almost missed the piece about depth. depth "is approximately the depth of the observed species". Should that go into the occurrence record as minimumDepthInMeters and maximumDepthInMeters?

Also, we can definitely add measurementMethod to each of the emof records from the resources at https://sanctsound.ioos.us/q_where-listen.html. I'll look for DOI references there.

I agree with the approach on samplingProtocol. Again, I'll look around the sanctsound website and see if there is a DOI or something more permanent to reference to.

albenson-usgs commented 11 months ago

Yes that would be great to add depth into the occurrence file using minDIM and maxDIM! Thanks for making the connections to the sampling methods 🙌

MathewBiddle commented 11 months ago

Updated:

emof - https://github.com/MathewBiddle/bio_data_guide/blob/sanctsound/datasets/SanctSound/data/emof.zip occurrence - https://github.com/MathewBiddle/bio_data_guide/blob/sanctsound/datasets/SanctSound/data/occurrence_w_coordinateUncertainty.zip

Hopefully these address the above comments.

albenson-usgs commented 11 months ago

Everything looks great! All I need now is the metadata. Should I reach out to Carrie for that?

MathewBiddle commented 11 months ago

Yeah, Carrie would be the one for the metadata. We could point her to https://ioos.github.io/mbon-docs/metadata-eml.html

MathewBiddle commented 10 months ago

@albenson-usgs new occurrence file is at https://github.com/MathewBiddle/bio_data_guide/blob/sanctsound/datasets/SanctSound/data/occurrence_w_coordinateUncertainty.zip

I added a new notebook which walks through how the changes were made. See https://github.com/MathewBiddle/bio_data_guide/blob/sanctsound/datasets/SanctSound/04_data_fixes.ipynb

MathewBiddle commented 10 months ago

closes #147