Closed bsaul closed 2 years ago
@ahhurlbert - I have a draft import process in this MR. For the most part, the scripts I had were moved to R functions (with the exception of the code that scraped the website). Then the import process boils down to:
library(magrittr)
library(mbbs)
mbbs_orange <-
import_ebird_data('inst/extdata/MyEBirdData_Orange_20210316.csv') %>%
prepare_mbbs_data(
# mbbs_site_20190127.rds contains website data for Orange only
mbbs_site_dt = readRDS("inst/extdata/mbbs_site_20190127.rds")
) %>%
combine_site_ebird()
That said, there are still a few hard-coded assumptions in the functions in the R/
directory (that I tried to label with TODO
or a clear comment). I'm sure I missed a few.
Also, we're hitting issues with misspelling on location names. For example, I extract the mbbs county using the regex "[Oo]range|[Cc]hatham|[Dd]urham"
, but submission S89832222
has "MBBS, Chatman, Route 1-9"
as the location name. We have the options of either having people clean these up or play whack-a-mole with various spellings. What are your thoughts?
I've found a number of duplicate submission for route 1 for a number of years; e.g.:
1 S6659580 2009 1
2 S6659666 2009 1
3 S10953436 2012 1
4 S11003090 2012 1
5 S14515661 2013 1
6 S14248552 2013 1
7 S29713521 2016 1
8 S30189791 2016 1
9 S37213429 2017 1
10 S37523618 2017 1
Can you help clean these up? To that end, there is a inst/excluded_submissions.yml
file in which to put submissions to exclude. The format should be obvious.
Toy around with the functions and let me know what you think.
Also, I found a chatham county submission in the orange county account.
> import_ebird_data('inst/extdata/MyEBirdData_Orange_20210917.csv') %>% filter(mbbs_county == "chatham") %>% distinct(sub_id)
# A tibble: 1 x 1
sub_id
<chr>
1 S89966940
Do we want people to clean these up or do we want to do the consistency checks once all the counties are combined. Currently, prepare_mbbs_data
fails if there is more than one unique value for mbbs_county
in the ebird data. I think it's cleaner to handle this in ebird and not in the data processing.
TODO