nc-minibbs / mbbs

A repository for the Mini-Bird Breeding Survey data
https://minibbs.us
Other
2 stars 0 forks source link

Errant counts in Chatham route 6 in 2005? #28

Closed bsaul closed 1 year ago

bsaul commented 1 year ago

@ahhurlbert - Chatham route 6 has a big spike in counts in 2005. I'm seeing this spike for multiple species, so I'm wondering if double counting happened in the data scraping process. There's a spike in 2010 too, but I'm not seeing this spike for multiple species.

> 
> dt %>%
+   filter(mbbs_county == "chatham", route_num == 6) %>%
  group_by(year) %>%
+   group_by(year) %>%
+   summarize(count = sum(count))
# A tibble: 15 × 2
    year count
   <dbl> <dbl>
 1  2000   318
 2  2001   352
 3  2002   272
 4  2003   221
 5  2004   193
 6  2005   726 <-- here
 7  2006   288
 8  2007   256
 9  2008   289
10  2009   335
11  2010   628
12  2018   318
13  2019   253
14  2020   277
15  2021   276
bsaul commented 1 year ago

The spike is big enough to see on the total counts across all counties. See 2005:

image

ahhurlbert commented 1 year ago

Yes, you correctly identified the problem. Looks like in extdata/chatham_2000-2009_from_website.csv, rows 3542 thru 4024 are duplicates and can be removed (the end of Route 3 surveys through Route 14 surveys).

bsaul commented 1 year ago

rows 3542 thru 4024 are duplicates and can be removed

Great. I'll open a PR for a fix.