merely-useful / r-rse

ARCHIVED: Activity moved to rostools organization.
Other
11 stars 5 forks source link

Brainstorming for final exercises of chapters #29

Closed lwjohnst86 closed 3 years ago

lwjohnst86 commented 3 years ago

Leave comments, ideas, thoughts here!

mbonsma commented 3 years ago

Thoughts on guiding principles:

cwickham commented 3 years ago

A bit of a brain dump of ideas that have occurred to me over the last week or two:

mbonsma commented 3 years ago

Love these ideas, @cwickham! I was also thinking a weather package would be fun - relevant to everyone in the world, easy to customize to suit specific interests and locations. The package could focus actually fetching, parsing, and cleaning data, or it could focus on building specific summaries. I think summaries is more interesting, personally, but the first part is useful too (and if we include it, we could provide as much hand-holding as we like). Here are some potential data sources:

A very quick google search suggests that this is not a completely saturated problem - a cool, simple weather analyzer package might actually be generally useful as well.

lwjohnst86 commented 3 years ago

Really liking these ideas! Weather is something that everyone experiences so this is super general purpose!

Along similar lines would be something that makes summaries of "cost of living" in various places. E.g. Numbeo has a bunch of stuff related to that. Not sure how easy it is to get the data but something to think about as well.

cwickham commented 3 years ago

Some updates of my exploration of the weather/climate package ideas from exploring the potential data sources mentioned above.

World Weather Information Service

Provides a 7-day forecast, along with a monthly climate summary by hitting a URL.

Usage

library(wwis)
city_search("portland")
#> # A tibble: 3 x 3
#>   country                  city             cityid
#>   <chr>                    <chr>             <dbl>
#> 1 Australia                Portland           1720
#> 2 United States of America Portland, Maine     809
#> 3 United States of America Portland, Oregon    810
portland <- city_id("Portland, Oregon")
forecast(portland)
#> # A tibble: 7 x 8
#>   forecastDate wxdesc weather      minTemp maxTemp minTempF maxTempF weatherIcon
#>   <chr>        <chr>  <chr>        <chr>   <chr>   <chr>    <chr>          <int>
#> 1 2021-01-14   ""     Sunny        ""      13      ""       55              2402
#> 2 2021-01-15   ""     Sunny Perio… "7"     11      "45"     51              2201
#> 3 2021-01-16   ""     Fog          "3"     12      "37"     54              1601
#> 4 2021-01-17   ""     Light Showe… "4"     11      "40"     51              1201
#> 5 2021-01-18   ""     Sunny Perio… "4"     9       "39"     48              2201
#> 6 2021-01-19   ""     Light Showe… "2"     9       "35"     48              1201
#> 7 2021-01-20   ""     Mostly Clou… "3"     8       "38"     47              2302
climate(portland)
#> # A tibble: 12 x 10
#>    month maxTemp minTemp meanTemp maxTempF minTempF meanTempF raindays rainfall
#>    <int> <chr>   <chr>   <lgl>    <chr>    <chr>    <lgl>     <chr>    <chr>   
#>  1     1 8.3     2.1     NA       47.0     35.8     NA        18.0     124.0   
#>  2     2 10.7    2.4     NA       51.3     36.3     NA        14.9     93.0    
#>  3     3 13.7    4.2     NA       56.7     39.6     NA        17.6     93.5    
#>  4     4 16.3    6.2     NA       61.4     43.1     NA        16.4     69.3    
#>  5     5 20.0    9.2     NA       68.0     48.6     NA        13.6     62.7    
#>  6     6 23.1    12.0    NA       73.5     53.6     NA        9.2      43.2    
#>  7     7 27.0    14.3    NA       80.6     57.8     NA        4.1      16.5    
#>  8     8 27.3    14.4    NA       81.1     58.0     NA        3.9      17.0    
#>  9     9 24.3    11.7    NA       75.8     53.1     NA        6.7      37.3    
#> 10    10 17.7    7.8     NA       63.8     46.0     NA        12.5     76.2    
#> 11    11 11.6    4.7     NA       52.8     40.5     NA        19.0     143.0   
#> 12    12 7.6     1.8     NA       45.6     35.2     NA        18.6     139.4   
#> # … with 1 more variable: climateFromMemDate <chr>

Pros

Cons

World Weather Records

Provides monthly climate summaries. WWIS appears to use the most recent set of these for their climate summaries, but WWR has summaries for every decade back to 1921-30. The data will require some effort in parsing, but we could provide it parsed and build a package around doing something with the data.

It's not obvious what the functionality should/could be, some options:

Pros

Cons

Open Weather Map

Has an API for forecast data, and limited (5-day) historical data. Building a package around the API, is probably too much for this audience. Also, has "Bulk History" download (for a fee), but this provides hourly historical data for a location which we could build a package around. I played with this for Corvallis.

Usage

library(corvweather)
weather
#> # A tibble: 377,245 x 22
#>    datetime             year month  temp feels_like temp_min temp_max pressure
#>    <dttm>              <dbl> <dbl> <dbl>      <dbl>    <dbl>    <dbl>    <dbl>
#>  1 1978-12-31 16:00:00  1978    12  267.       260.     266.     269.     1034
#>  2 1978-12-31 17:00:00  1978    12  267.       259.     266.     269.     1033
#>  3 1978-12-31 18:00:00  1978    12  265.       260.     264.     269.     1035
#>  4 1978-12-31 19:00:00  1978    12  263.       257.     263.     264.     1035
#>  5 1978-12-31 20:00:00  1978    12  263.       257.     263.     264.     1035
#>  6 1978-12-31 21:00:00  1978    12  262.       256.     261.     264.     1036
#>  7 1978-12-31 22:00:00  1978    12  262.       256.     261.     263.     1036
#>  8 1978-12-31 23:00:00  1978    12  262.       256.     261.     263.     1036
#>  9 1979-01-01 00:00:00  1979     1  261.       254.     261.     263.     1037
#> 10 1979-01-01 01:00:00  1979     1  262.       256.     261.     264.     1037
#> # … with 377,235 more rows, and 14 more variables: sea_level <lgl>,
#> #   grnd_level <lgl>, humidity <dbl>, wind_speed <dbl>, wind_deg <dbl>,
#> #   rain_1h <dbl>, rain_3h <dbl>, snow_1h <dbl>, snow_3h <dbl>,
#> #   clouds_all <dbl>, weather_id <dbl>, weather_main <chr>,
#> #   weather_description <chr>, weather_icon <chr>

Plus some functions that do some kind of summary of this data. (I need to think about these).

Pros

Cons

GHCN climate network data

A possible source of daily weather data: https://www.ncdc.noaa.gov/cdo-web/search?datasetid=GHCND

Like the idea above, I imagine we'd provide data for one location and the pacakge would revolve around building summaries of that data (i.e. the monthly climate summary WWIS returns).

Pros

Cons

lwjohnst86 commented 3 years ago

We'll discuss this during the meeting, but I'm adding so it's here. Here are my thoughts about the climate data source.

I've looked over some of the options and I think the GHCN option is the best for our needs. Instructors and learners can search and download data for their area with https://www.ncdc.noaa.gov/cdo-web/. If they want to be a bit more challenging, they can even write/use simple API requests with https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation, so the difficulty can be modified as needed. We'd just have to write detailed instructions to the instructors/self-learners on how to use that API properly, since the website documentation doesn't seem that well described. There's also the FTP link too ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/. Anyway, that's my thought.

cwickham commented 3 years ago

There is also the package rnoaa for getting GHCN data - I've played with it a bit and it will greatly simplify getting clean data (for us and for other people to customize the assignment).

cwickham commented 3 years ago

If we provide a concrete example, what location(s) would we want to use?

lwjohnst86 commented 3 years ago

This is basically done from the previous phase.