shannonrankin / BANTER_BeakedWhales

0 stars 0 forks source link

Review Pascal Data w/ Wigner #10

Open shannonrankin opened 1 year ago

shannonrankin commented 1 year ago

Review Pascal data from Beaker Banter paper with Pascal data from Wigner project to ensure (1) events are the same and (2) species labels are the same.

TaikiSan21 commented 1 year ago

First off quite a big difference - your database folder is missing a bunch that I have. Looks like databases for Drifts 8, 11, 13, & 18. I double checked with the original RoboJ processing code I have from Jay, and those almost correspond with the drifts that have SM2Bat recorders, but those are 8, 11, 14 & 18. I had not removed SM2Bat recorders, but I did remove the SM3M recorders from my analysis (Drifts 7, 10, 13, & 17). I believe I decided to remove SM3M recorders after discussions with Anne that they were very noisy and we won't be using them going forward, I'm not sure if I should have removed the SM2Bat recorders.

shannonrankin commented 1 year ago

It is rather painful to re-read those emails to remember decisions (chaos abounds!). SM2 were noisy and had lower sample rates, which could cause problems in characterizing beaked whale clicks. So, these were excluded. SM3M were also excluded.

I suspect these events would be fine when using a classifier that relies on shape of a sound (such as wigner), but it could be problematic when basing a classifier on measures of the sound, as the lower sampling rate will affect some of the measurements.

My recommendation (now, anyhow!) is that:

  1. We DO NOT use SM3M or SM2 for BANTER because noise and lower sample rates will affect measures.
  2. We do a trial of Wigner model WITHOUT SM2 data (to provide a more apples to apples comparison with BANTER)
  3. We do a trial of Wigner model WITH SM2 data to see how well it performs. If it performs well, then what this shows is that this approach to classification may be robust against some complications in recording.
TaikiSan21 commented 1 year ago

Okay, sounds like a reasonable plan. In good news, it looks like the numbers of events / labels of events / species codes of events are the same for your databases and the ones I used for Caltech.

  1. Easy enough, I'll incorporate this
  2. This will take a bit of time to re-do. I checked, and unfortunately the SM2 were responsible for 520 out of my total 2178 detections of BW37V, including all of the BW37V in my original validation set. So I'll have to create a new datasplit and then re-train the model. None of this is too difficult, just takes time to fit in with everything else.
  3. Do you mean to like predict on SM2 data with our Non-SM2 model? My current models were already trained with SM2 data mixed in because I didn't know they have issues. So either this is already taken care of, and it worked well, or easy enought o test after (2) is done.
shannonrankin commented 1 year ago
  1. Great
  2. No worries--- Hopefully I"ll publish my paper first, this will be valuable for apples-apples comparison, but no time crunch.
  3. I was just thinking of presenting 2 models (for your paper): (1) comparable to my BANTER one (apples-apples) and (2) add in SM2 (which you already have-- this will hopefully show that for some situations, Wigner may be preferred (if it allows you to use more data).
  4. Finally-- if all the events/labels/codes are the same, then I can move forward with my banter analysis on this dataset (just need to filter to only keep ch1).
  5. Let me know if there is anything else-- I think I just need the markdown file for hawaii from you. Or, just let me know how you would code things so I can slowly work towards better approach. :)