ocean-tracking-network / surimi

R package for translating acoustic telemetry data from one institutional format to another.
GNU General Public License v3.0
4 stars 1 forks source link

Assessing Caitlin's feedback #9

Open jackVanish opened 8 months ago

jackVanish commented 8 months ago

Caitlin helpfully checked out a set of surimi output- OTN detection extract to IMOS tripartite format, then from that same format to OTN tripartite format. She helped identify some of the places where the process loses data, so I've taken her feedback and assembled it into this checklist so I can keep track of the changes I need to make/have made. Her messages are first, followed by my notes in parentheses. A note- anything that can go from a detExtract to a tripartite file will ALSO have to be accounted for if someone HAS the tripartite OTN file. i.e, anything I fix in the 'derive' functions must be accounted for in the main function as well.

OTN DetExtract -> IMOS detections

OTN DetExtract -> IMOS receivers

OTN DetExtract -> IMOS tags

IMOS detections -> OTN detections

IMOS receivers -> OTN receivers

IMOS tags -> OTN tags

jackVanish commented 8 months ago

Regarding the worms aphiaID- I wrote a quick helper function to get the aphiaID from the sciname before Jon correctly pointed out that the worrms library does exactly that, so that's what I'm using now. I still have to write a couple of little helper functions around it, because plugging it into the mutate functions means we're doing a lot of queries that can bloat the code's runtime. So I'm making a little lookup table out of the unique scinames/aphiaids and then we can just do each scientific name query once, building the columns in Surimi out of the lookup table.

jackVanish commented 8 months ago

Code to get and add the aphia ID added; going to write in the comments and packagify it before I merge it.

jackVanish commented 7 months ago

Added code to handle assigning receiver_project_name and tagging_project_name. The code runs and seems to me to act correctly, we will check out the correctness when we run the second set of tests after these issues are resolved.

jackVanish commented 7 months ago

Took the releases out of the detections dataframe before deriving the receivers from it.

jackVanish commented 7 months ago

Just checked and it seems like we are already using release lat/lon/and datetime to fill in the appropriate columns, so we just need to handle locality.

jackVanish commented 7 months ago

While closing Remora tickets I found this: IMOS.metadata.mapping (2).xlsx

Looks like we decided on the mapping some time ago- receiver_name and receiver_project_name can be mapped to receiver and otn_array respectively.

jackVanish commented 7 months ago
Screen Shot 2024-03-13 at 9 21 53 AM

While following up I found this, with the attached comment, which is why receiver was blank. @CaitlinBate can you weigh in on this one?

naomitress commented 6 months ago

Screen Shot 2024-03-13 at 9 21 53 AM While following up I found this, with the attached comment, which is why receiver was blank. @CaitlinBate can you weigh in on this one?

@jackVanish do you have examples of each of the receiver_id and receiver_name values, once these are provided, i should be able to provide some guidance

jackVanish commented 6 months ago

In checking out the IMOS -> OTN pipeline I realized it's referring to the OTN->IMOS receiver/tag derivation functions. This needs to be fixed!

jackVanish commented 6 months ago

@naomitress Here are the IMOS data test files included in Remora, so these are what I would've been using to build towards Surimi's OTN -> IMOS pipeline.

IMOS_animal_measurements.csv IMOS_detections.csv IMOS_receiver_deployment_metadata.csv IMOS_transmitter_deployment_metadata.csv

naomitress commented 6 months ago

@naomitress Here are the IMOS data test files included in Remora, so these are what I would've been using to build towards Surimi's OTN -> IMOS pipeline.

IMOS_animal_measurements.csv IMOS_detections.csv IMOS_receiver_deployment_metadata.csv IMOS_transmitter_deployment_metadata.csv

IMOS_receiver_deployment_metadata.csv has receiver_name ie VR2W-109075

IMOS_detections.csv has receiver_name ie VR2W-113955 but it also has receiver_id ie 100577385

so, receiver_id should be ignored in favour or receiver_name as this is analogous to receiver

@jackVanish were receiver_id or receiver_name found in any other file types?

jackVanish commented 6 months ago

When I went back to build out the imos -> OTN piece I found that the reason the main imos_otn column mapping function was underbuilt was because I had started building out two separate functions to map receiver metadata and tag metadata. I finished out those and then built a detections one based on the mapping files supplied in the now-closed Remora tickets. I think that'll get us most of the way through the IMOS -> OTN pipeline feedback, I've checked off the stuff that i know is solid. I will have some new test files to look at shortly.

jackVanish commented 6 months ago

@naomitress Here are the IMOS data test files included in Remora, so these are what I would've been using to build towards Surimi's OTN -> IMOS pipeline. IMOS_animal_measurements.csv IMOS_detections.csv IMOS_receiver_deployment_metadata.csv IMOS_transmitter_deployment_metadata.csv

IMOS_receiver_deployment_metadata.csv has receiver_name ie VR2W-109075

IMOS_detections.csv has receiver_name ie VR2W-113955 but it also has receiver_id ie 100577385

so, receiver_id should be ignored in favour or receiver_name as this is analogous to receiver

@jackVanish were receiver_id or receiver_name found in any other file types?

Also thanks for this Naomi, I will favour receiver-name!

jackVanish commented 6 months ago

Also, in the IMOS test files above, we should be able to find the analogue to detectedby/project_code. I think we're handling this now with the coll_code parameter passed to otn_imos_column_map but we can always double-check.

CaitlinBate commented 6 months ago

re: generally, this is a very diffferent format than IMOS detections. do u remember why we picked this one? to match OTNs raw detection tables?

when looking at the otn_detections output file (made from the IMOS test data), is this format supposed to match schema.c_detections_YYYY table formats for OTN? there are very few columns included so i am wondering the reasoning behind this being our end-product (what are we going to use it for? it doesnt match otn detection extracts for example)

also, my point still stands that receiver (otn column) needs to be completed (recver_id is the IMOS column)

jackVanish commented 6 months ago

Right, the IMOS->OTN surimi output was erroneous, so I'm not surprised it's weird and bonkers. We won't be using that, we're shooting for OTN detection-extract-like. Shannon and I are working on building that out. That should also cap off the receiver_id piece as well, since that'll be handled in building out the new functions. Can you speak to the project_code/detectedby bit in the first group?

The long version regarding the IMOS -> OTN piece. is that when I generated the IMOS->OTN output, I used a function that was a carbon copy of the OTN->IMOS one, because I had forgotten that I'd decided to break it up into three separate functions (one for tags, one for receivers, one for detections). So the output was basically garbage, and it's my fault for that. I spent some time last week building out the two functions (receivers and tags, IMOS -> OTN) that already existed, and working in a new one (IMOS -> OTN deteections) based on mappings given as part of building the code in Remora. The detections file is incomplete right now, which is what Shannon's helping out with, so the feedback here regarding columns in the IMOS -> OTN output will be addressed as part of building out those functions.

CaitlinBate commented 6 months ago

project_code/detectedby bit -- when moving from OTN extracts to IMOS extracts we need to make sure the project code (OTN's detectedby column) is put into the receiver_project_name column. i dont think we had example data before to look at