ropensci / weathercan

R package for downloading weather data from Environment and Climate Change Canada
https://docs.ropensci.org/weathercan
GNU General Public License v3.0
102 stars 29 forks source link

Add normals data #38

Closed steffilazerte closed 5 years ago

steffilazerte commented 7 years ago

OR

Implement a normals calculation

boshek commented 7 years ago

I tend to think that using the ECCC values makes the most sense because decisions in summarizing the data may have been made that we will be unaware off. The opens up a situation where we could have different data (if I am understanding this correctly). On the other hand though, since the climate normals are not current, we may be able to update them. It also may be more to maintain if we create a normals function.

Is there any reason that we wouldn't simply point to the csv files:

ftp://ftp.tor.ec.gc.ca/Pub/Normals/English/English_CSV_files/

steffilazerte commented 7 years ago

We totally can, the only drawback is that the files bundle stations data so we'll have to download the correct file, then extract the data (and the files are generally 5-8mb). It's not the end of the world, but would be slow... Alternatively, we could bundle the data in weathercan as an internal dataset, but that would be a large file to have. So as I see it we have three options:

  1. Function to access and extract station normals from ftp site as needed Pro: Simple and up-to-date Con: Slow

  2. Function to acces and extract station normals from locally store data as needed Pro: Relatively simple, fast Con: May get out of date, large data set to store

  3. We calculate normals based on ftp://client_climate@ftp.tor.ec.gc.ca/Pub/Normals/English/Calculation_of_the_1981_to_2010_Climate_Normals_for_Canada.doc And hope that we can recreate the values Pro: Fast, up-to-date Con: May not match ECCC values if there's something else going on

boshek commented 7 years ago

Thinking about this a little more I think option 3 is probably the best. I think the issue is that it is also the most work. I hadn't totally understood the .csv files correctly so thanks for explaining that.

steffilazerte commented 7 years ago

Yeah, it's really too bad about .csv files because that would definitely be the best way to go about it. I agree that option 3 is probably the most work :) we'll see how much time I have!

steffilazerte commented 7 years ago

Actually there's also option 4) Scrape the data from the website: http://climate.weather.gc.ca/climate_normals/results_1981_2010_e.html?searchType=stnName&txtStationName=brandon&searchMethod=contains&txtCentralLatMin=0&txtCentralLatSec=0&txtCentralLongMin=0&txtCentralLongSec=0&stnID=3471&dispBack=0

But this is prone to errors and may be susceptible to small website changes.

steffilazerte commented 5 years ago

Consider adding normals from:

https://dd.meteo.gc.ca/climate/doc/observations/README_climate.txt

With the data being here:

https://dd.meteo.gc.ca/climate/observations/