> season = count(diet, Observation_Season) %>% arrange(desc(n)) %>% data.frame()
> season
Observation_Season n
1 <NA> 18600
2 Summer 2994
3 Winter 1507
4 Fall 1164
5 Spring; Summer 1081
6 Spring 1040
7 Multiple 1010
8 multiple 697
9 All 491
10 summer 242
11 spring 140
12 All year 118
13 Fall; Winter 118
14 Spring; Summer; Fall 74
15 Summer; Winter 74
16 49
17 Year Round 32
18 Fall; Spring 26
19 Summer; Fall 19
20 Winter; Spring; Fall 17
21 Winter; Spring 15
22 fall 11
23 Fall; Winter; Spring 9
24 woodland 5
25 agriculture; woodland 4
26 agriculture; grassland 3
1) make all season names lowercase
2) "All", "All year", "Year Round", "Multiple", and any combinations (e.g. "Fall; Winter") --> "multiple"
3) Looks like there are some Habitat_type values that accidentally got put in this field, so move those over and fill in Season as appropriate
4) Fill in value where value is blank; this could be NA if there is no information about season/date in the study
5) There are 18,600 records with NA for season, but many of these records have values in the Observation_Month fields that could be used to fill in Season
Run this code to get a list of the studies where season is currently NA and fill in as appropriate:
foo = filter(diet, is.na(Observation_Season), !is.na(Observation_Month_Begin)) %>% mutate(Source2 = substr(Source, 1, 35)) %>% select(Common_Name, Observation_Month_Begin, Observation_Year_Begin, Observation_Month_End, Observation_Year_End, Source2) %>% unique()
> head(foo)
Common_Name Observation_Month_Begin Observation_Year_Begin Observation_Month_End Observation_Year_End Source2
1 Bald Eagle 12 1986 6 1988 Mersmann, T. J. 1989. Foraging ecol
64 Bald Eagle 12 1986 12 1987 Mersmann, T. J. 1989. Foraging ecol
78 Bald Eagle 6 1971 8 1971 Ofelt, C. H. 1975. Food habits of n
91 Bald Eagle 5 1963 8 1963 Retfalvi, L. 1970. Food of nesting
94 Bald Eagle 5 1962 9 1962 Retfalvi, L. 1970. Food of nesting
99 Bald Eagle 4 1998 9 2001 Thompson, C. M., P.E. Nye, G. A. Sc
NOTE:
Dec, Jan, Feb = winter
Mar, Apr, May = spring
Jun, Jul, Aug = summer
Sep, Oct, Nov = fall
However, if a study spans just a single month outside of the definition above on either end (e.g., May-August), I would label that according to the primary season ("summer") rather than "multiple".
Current summary of the field:
1) make all season names lowercase 2) "All", "All year", "Year Round", "Multiple", and any combinations (e.g. "Fall; Winter") --> "multiple" 3) Looks like there are some Habitat_type values that accidentally got put in this field, so move those over and fill in Season as appropriate 4) Fill in value where value is blank; this could be NA if there is no information about season/date in the study 5) There are 18,600 records with NA for season, but many of these records have values in the Observation_Month fields that could be used to fill in Season
Run this code to get a list of the studies where season is currently NA and fill in as appropriate:
NOTE: Dec, Jan, Feb = winter Mar, Apr, May = spring Jun, Jul, Aug = summer Sep, Oct, Nov = fall
However, if a study spans just a single month outside of the definition above on either end (e.g., May-August), I would label that according to the primary season ("summer") rather than "multiple".