hurlbertlab / dietdatabase

Creative Commons Zero v1.0 Universal
10 stars 9 forks source link

clean Observation_Season field #71

Closed ahhurlbert closed 7 years ago

ahhurlbert commented 7 years ago

Current summary of the field:

> season = count(diet, Observation_Season) %>% arrange(desc(n)) %>% data.frame()
> season
       Observation_Season     n
1                    <NA> 18600
2                  Summer  2994
3                  Winter  1507
4                    Fall  1164
5          Spring; Summer  1081
6                  Spring  1040
7                Multiple  1010
8                multiple   697
9                     All   491
10                 summer   242
11                 spring   140
12               All year   118
13           Fall; Winter   118
14   Spring; Summer; Fall    74
15         Summer; Winter    74
16                           49
17             Year Round    32
18           Fall; Spring    26
19           Summer; Fall    19
20   Winter; Spring; Fall    17
21         Winter; Spring    15
22                   fall    11
23   Fall; Winter; Spring     9
24               woodland     5
25  agriculture; woodland     4
26 agriculture; grassland     3

1) make all season names lowercase 2) "All", "All year", "Year Round", "Multiple", and any combinations (e.g. "Fall; Winter") --> "multiple" 3) Looks like there are some Habitat_type values that accidentally got put in this field, so move those over and fill in Season as appropriate 4) Fill in value where value is blank; this could be NA if there is no information about season/date in the study 5) There are 18,600 records with NA for season, but many of these records have values in the Observation_Month fields that could be used to fill in Season

Run this code to get a list of the studies where season is currently NA and fill in as appropriate:

foo = filter(diet, is.na(Observation_Season), !is.na(Observation_Month_Begin)) %>% mutate(Source2 = substr(Source, 1, 35)) %>% select(Common_Name, Observation_Month_Begin, Observation_Year_Begin, Observation_Month_End, Observation_Year_End, Source2) %>% unique()
> head(foo)
   Common_Name Observation_Month_Begin Observation_Year_Begin Observation_Month_End Observation_Year_End                             Source2
1   Bald Eagle                      12                   1986                     6                 1988 Mersmann, T. J. 1989. Foraging ecol
64  Bald Eagle                      12                   1986                    12                 1987 Mersmann, T. J. 1989. Foraging ecol
78  Bald Eagle                       6                   1971                     8                 1971 Ofelt, C. H. 1975. Food habits of n
91  Bald Eagle                       5                   1963                     8                 1963 Retfalvi, L. 1970. Food of nesting 
94  Bald Eagle                       5                   1962                     9                 1962 Retfalvi, L. 1970. Food of nesting 
99  Bald Eagle                       4                   1998                     9                 2001 Thompson, C. M., P.E. Nye, G. A. Sc

NOTE: Dec, Jan, Feb = winter Mar, Apr, May = spring Jun, Jul, Aug = summer Sep, Oct, Nov = fall

However, if a study spans just a single month outside of the definition above on either end (e.g., May-August), I would label that according to the primary season ("summer") rather than "multiple".

pwinner1 commented 7 years ago

finished