wildcam-gorongosa / data-cleaning

Scripts for cleaning Zooniverse output
0 stars 0 forks source link

Issues with camera operation dates and records #4

Open kaitlyngaynor opened 3 years ago

kaitlyngaynor commented 3 years ago

I went through and cross-checked the records from wildcam_fulldata_2019.csv against the Camera_operation_year1-4.csv and noted a few issues that need checking:

Cameras with photos in the future

meredithspalmer commented 3 years ago

Thanks for this thorough evaluation! I'm working through the issues, have flagged you in a few that require your input:

Main issues:

F03 was supposedly not working 7/14/18 (end of year2) through 9/15/18 (start of year3) but there were many images in that window. Meredith, is there a reason that 9/15/18 was listed as the start date for year3?

@kaitlyngaynor: In "Camera_grid_2016", it says that this camera's last check before I started on the grid ran from 6/11/17 to 9/15/18. Pictures from my first SD from this camera confirm that the roll I collected in May 2019 was set 9/15/18. However, the last entry in your "Camera_operation_years1and2" states that this camera was last checked 7/14/18, not 9/15/2018. There is no entry in the Camera_grid_2016 to suggest that this camera was visited in July 2018. I'm assuming that the July date is a typo and that this date should in fact be 9/15/18? I can't confirm with the raw images because I only have your first year of data. Can you please double-check that this camera did indeed run through 9/15/2018?

H03 - Meredith, you noted an issue 2/1/19 - 2/13/19 but there are 85 records here. Was there grass too tall to consider the camera operational?

The grass was definitely tall enough that we likely missed smaller animals but not talk enough to block the field of view entirely. I've taken a second look and updated to the dates only where the camera was completely non-operational instead.

D03 - Is there a reason that the problem ends 3/14/19 but session ends 3/20/19? Did the camera start working again? (or was veg cleared?)

As recorded in the "Notes" column in "Camera_operation_year1-4", the floodwaters squashed the vegetation back down to open up the camera's field of view.

D05 - Why a problem 5/29/19 - 7/30/19? There were 183 records in this window E04 - End date is 6/5/19 but there are many images after (Meredith noted "figure out rest" so maybe that's what needs to happen?)

Thanks for catching! In both of these sites, I think that you corrected the timestamps for Sarah and I didn't have the corrected dates stored locally. When looking at the raw images, I assumed we hadn't been able to back-calculate the actual timestamps. These are good now!

H03 - Why is there a problem with the same start date and end date (and same end date)? Can I just ignore this, or is there some other problem to consider?

@kaitlyngaynor: The images from the last day are unusable (white). I added the "to" and "from" as the same date on the last day to invalidate that day. Did I code this incorrectly?

H03 - Why is there a problem 2/1/19 - 2/13/19 but there are 68 records in this two-week period?

@kaitlyngaynor: In my version of the record table, there is no problem fro site H03 (or any site) between those dates. Could you please double-check?

H07 - The last end date is 11/21/18, but images 5/21/19 - 10/16/19 - why aren't 2019 dates in the spreadsheet?

@kaitlyngaynor: We need to talk about these - this is a site with a reset timestamp that we back-calculated, but the calculation has created images that "occurred" before the SD card was set (05/29/19). What do you want to do with this SD card?

meredithspalmer commented 3 years ago

Cameras with photos in the future

B07 - many 2020 and 2023 images

This roll has massive timestamp issues (multiple resets) and from our email conversations, I believe we settled on invalidating the entire roll.

B09 - many images from 2023 I10 - everything is from 2021

For these sites, the Zooniverse upload incorrectly "corrected" these bad timestamps. As recored elsewhere on Github: "Frustratingly, with the messed up timestamp and renaming conventions in the Zooniverse data, you can't easily re-match the images to the classifications. I have not deleted the bad data from Sarah's data-dump BUT when using, delete these data and use my re-classifications instead"

For all three sites, I have now gone ahead and removed these bad data from the "wildcam_fulldata_2019 and included code in the data compiler (MSP_consolidate_wildcam.R) to remove these records.

kaitlyngaynor commented 3 years ago

In retrospect, probably should have started a separate issue for each camera, as this is getting hard to read, hah. I have quoted the previous threads and replied underneath. (I removed those that are now fully resolved.)

F03 was supposedly not working 7/14/18 (end of year2) through 9/15/18 (start of year3) but there were many images in that window. Meredith, is there a reason that 9/15/18 was listed as the start date for year3?

@kaitlyngaynor: In "Camera_grid_2016", it says that this camera's last check before I started on the grid ran from 6/11/17 to 9/15/18. Pictures from my first SD from this camera confirm that the roll I collected in May 2019 was set 9/15/18. However, the last entry in your "Camera_operation_years1and2" states that this camera was last checked 7/14/18, not 9/15/2018. There is no entry in the Camera_grid_2016 to suggest that this camera was visited in July 2018. I'm assuming that the July date is a typo and that this date should in fact be 9/15/18? I can't confirm with the raw images because I only have your first year of data. Can you please double-check that this camera did indeed run through 9/15/2018?

I found a note in Camera_grid_2016 that explains this. Sorry, I had forgotten! In notes column: "night photos are entirely black after 7/14/18 - just consider this the effective end date" This wasn't the case for the images from Sept 2018 onward, was it?

H03 - Why is there a problem with the same start date and end date (and same end date)? Can I just ignore this, or is there some other problem to consider?

@kaitlyngaynor: The images from the last day are unusable (white). I added the "to" and "from" as the same date on the last day to invalidate that day. Did I code this incorrectly?

Ah okay, sounds good. I think that we should just specify End date as 3/15/19 for this to work correctly. I can make this change.

H03 - Why is there a problem 2/1/19 - 2/13/19 but there are 68 records in this two-week period?

@kaitlyngaynor: In my version of the record table, there is no problem fro site H03 (or any site) between those dates. Could you please double-check?

I am truly baffled as to how this ended up in my spreadsheet. Sigh, perhaps I should code up instead of making manually

H07 - The last end date is 11/21/18, but images 5/21/19 - 10/16/19 - why aren't 2019 dates in the spreadsheet?

@kaitlyngaynor: We need to talk about these - this is a site with a reset timestamp that we back-calculated, but the calculation has created images that "occurred" before the SD card was set (05/29/19). What do you want to do with this SD card?

Oh interesting. This tells me that the reset calculation was incorrect... From your notes in August "I spent some time trying to recalculate the times on H07 but I think this one is beyond saving. Could perhaps be salvaged with more effort?). Note to self: I need to update the operations data sheet with period unable to get data from." Seems like we should just throw these out (which it sounds like you've done)

B07 - many 2020 and 2023 images

This roll has massive timestamp issues (multiple resets) and from our email conversations, I believe we settled on invalidating the entire roll.

Okay. In that case think that the Camera_operation_year4 should probably read Start = 8/13/19, End = 10/13/19—is that correct (no date/time issues in this window)? Even if there was one day of good data on 5/25/19, we should probably still throw this out, right?

B09 - many images from 2023 I10 - everything is from 2021

For these sites, the Zooniverse upload incorrectly "corrected" these bad timestamps. As recored elsewhere on Github: "Frustratingly, with the messed up timestamp and renaming conventions in the Zooniverse data, you can't easily re-match the images to the classifications. I have not deleted the bad data from Sarah's data-dump BUT when using, delete these data and use my re-classifications instead"

Okay. No further action needed on this one, then?

For all three sites, I have now gone ahead and removed these bad data from the "wildcam_fulldata_2019 and included code in the data compiler (MSP_consolidate_wildcam.R) to remove these records.

By all three sites, do you mean H07, B07, and B09?

Did you update the Camera_operation_year1-4 or just Camera_operation_year4? I'm assuming only the latter, in which case we should just delete the former since it's now incorrect. Also, did you make changes to any cameras other than the ones that I flagged? Just trying to streamline the re-consolidation process.

meredithspalmer commented 3 years ago

F03 - The night images for F03 post-September are fine, so I am setting the end date as 7/14/18 and marking this issue as resolved

H03 - I have updated the end date to 3/15/19 in both "Camera_operation_years1-4.csv" and "Camera_operation_year4.csv"

H07 - I have not included the 2019 dates in the operation spreadsheets for the time being.

B07 - I have updated the start and end dates as suggested in both "Camera_operation_years1-4.csv" and "Camera_operation_year4.csv"

B09, I10 - These are good

By all three sites, do you mean H07, B07, and B09?

Whoops, referring to B09 and I10 - so now, the full dataset only includes my good records and has removed the Zooniverse bad timestamps.

_Did you update the Camera_operation_year1-4 or just Camera_operationyear4? I'm assuming only the latter, in which case we should just delete the former since it's now incorrect.

I updated both, and have now uploaded NEWLY updated versions (with these corrections) into the Google Drive. Note that year1-4 spreadsheet can be generated from individual year spreadsheets using the code I have also included in the Google Drive.

Also, did you make changes to any cameras other than the ones that I flagged? Just trying to streamline the re-consolidation process.

Any changes are noted in the R code. I updated a few classifications, but did not change the dates or operation status of any other cameras!