merely-useful / r-rse

ARCHIVED: Activity moved to rostools organization.
Other
11 stars 5 forks source link

Draft of book's final exercise example package (weather of Kenya) #55

Closed lwjohnst86 closed 1 year ago

lwjohnst86 commented 3 years ago

Don't have to be much for now, just to get an idea of what needs to be done and what it will generally look like. For now, hosting as a new repo in merely-useful should work.

cwickham commented 3 years ago

@ian-flores here's some code I had from working with rnoaa to get GHCN data, hopefully it helps to get started with the data.

library(rnoaa)
library(tidyverse)
ghcnd_countries() %>% 
  filter(name == "Kenya")
##   code  name
## 1   KE Kenya

Code for Kenya KE

stations <- ghcnd_stations()
stations
## # A tibble: 699,553 x 11
##    id    latitude longitude elevation state name  gsn_flag wmo_id element
##    <chr>    <dbl>     <dbl>     <dbl> <chr> <chr> <chr>    <chr>  <chr>  
##  1 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     TMAX   
##  2 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     TMIN   
##  3 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     PRCP   
##  4 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     SNOW   
##  5 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     SNWD   
##  6 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     PGTM   
##  7 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     WDFG   
##  8 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     WSFG   
##  9 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     WT03   
## 10 ACW0…     17.1     -61.8      10.1 ""    ST J… ""       ""     WT08   
## # … with 699,543 more rows, and 2 more variables: first_year <int>,
## #   last_year <int>

stations has one row per station and variable (element) combination. Easier to explore with one row per station, elements collapsed to a string, keeping track of years of observations in n_years:

stations_collapsed <- stations %>% 
  mutate(
    n_years = last_year - first_year,
    country = str_sub(id, 1, 2)) %>% 
  group_by(id, name, country) %>% 
  summarise(
    n_years = paste(n_years, collapse = ","),
    elements = paste(element, collapse = ",")
  ) %>% 
  arrange(desc(n_years))
stations_collapsed
## # A tibble: 117,841 x 5
## # Groups:   id, name [117,841]
##    id      name       country n_years                 elements                  
##    <chr>   <chr>      <chr>   <chr>                   <chr>                     
##  1 USC003… TIONESTA … US      99,99,99,99,99,99,69,6… TMAX,TMIN,TOBS,PRCP,SNOW,…
##  2 USC003… WALTERS    US      99,99,99,99,99,99,55,5… TMAX,TMIN,TOBS,PRCP,SNOW,…
##  3 USC001… ELK CITY … US      99,99,99,99,99,99,54,5… TMAX,TMIN,TOBS,PRCP,SNOW,…
##  4 USC003… FAITH      US      99,99,99,99,99,99,50,4… TMAX,TMIN,TOBS,PRCP,SNOW,…
##  5 USC001… CHEROKEE   US      99,99,99,99,99,99,4,0,… TMAX,TMIN,TOBS,PRCP,SNOW,…
##  6 USC004… GATLINBUR… US      99,99,99,99,99,99,38,6… TMAX,TMIN,TOBS,PRCP,SNOW,…
##  7 USW000… TANACROSS  US      99,99,99,99,99,99,14,6… TMAX,TMIN,TOBS,PRCP,SNOW,…
##  8 USC004… WARDENSVI… US      99,99,99,99,99,99,0,65… TMAX,TMIN,TOBS,PRCP,SNOW,…
##  9 USC004… SAN SABA   US      99,99,99,99,99,99,0,0,… TMAX,TMIN,TOBS,PRCP,SNOW,…
## 10 USC003… CEDAR BUT… US      99,99,99,99,99,97,55,2… TMAX,TMIN,TOBS,PRCP,SNOW,…
## # … with 117,831 more rows
stations_collapsed %>% 
  filter(country == "KE")
## # A tibble: 10 x 5
## # Groups:   id, name [10]
##    id          name               country n_years        elements               
##    <chr>       <chr>              <chr>   <chr>          <chr>                  
##  1 KE000063820 MOMBASA            KE      63,64,64,64    TMAX,TMIN,PRCP,TAVG    
##  2 KE000063740 JOMO KENYATTA INTL KE      63,62,63,32,64 TMAX,TMIN,PRCP,SNWD,TA…
##  3 KE000063612 LODWAR             KE      60,61,48,64    TMAX,TMIN,PRCP,TAVG    
##  4 KE000063661 KITALE             KE      60,61,48,0,64  TMAX,TMIN,PRCP,SNWD,TA…
##  5 KE000063624 MANDERA            KE      51,60,63,63    TMAX,TMIN,PRCP,TAVG    
##  6 KEM00063686 ELDORET INTL       KE      47,48,48,64    TMAX,TMIN,PRCP,TAVG    
##  7 KEM00063799 MALINDI            KE      42,47,47,58    TMAX,TMIN,PRCP,TAVG    
##  8 KE000063723 GARISSA            KE      39,48,64,60    TMAX,TMIN,PRCP,TAVG    
##  9 KEM00063741 NAIROBI/DAGORETTI  KE      35,36,36,63    TMAX,TMIN,PRCP,TAVG    
## 10 KE000063619 MOYALE             KE      29,44,85,44    TMAX,TMIN,PRCP,TAVG

Most have at least 30 years of records, but only the minimal variables.

From https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt:

Once you have an ID, get data with meteo_tidy_ghcnd(), e.g.

mombasa <- meteo_tidy_ghcnd("KE000063820")
mombasa
## # A tibble: 23,042 x 6
##    id          date        prcp  tavg  tmax  tmin
##    <chr>       <date>     <dbl> <dbl> <dbl> <dbl>
##  1 KE000063820 1957-01-01     0   271    NA    NA
##  2 KE000063820 1957-01-02     0   268   317   233
##  3 KE000063820 1957-01-03     0   272    NA   233
##  4 KE000063820 1957-01-04     0   271   317    NA
##  5 KE000063820 1957-01-05    18   267   317   233
##  6 KE000063820 1957-01-06     0   256   311   233
##  7 KE000063820 1957-01-07   175   254    NA   233
##  8 KE000063820 1957-01-08     0   264   306   233
##  9 KE000063820 1957-01-09     0   272    NA   233
## 10 KE000063820 1957-01-10     0   277    NA   239
## # … with 23,032 more rows
ian-flores commented 3 years ago

Hey folks,

I created the kenyaweather package and repo, under the merely-useful org. Is available here: https://github.com/merely-useful/kenyaweather

I believe we should keep track of all related to the package in that repo.