Open klwilson23 opened 4 years ago
Hi @klwilson23!
This behaviour is deliberate because most stations are separate entities and the purpose is to create data frames with comparable time series. Some stations are continuations of each other, but it isn't clear to weathercan
when that is the case.
Possibly we could consider adding a pad = FALSE
argument (opposite of the trim
argument) to avoid padding the time ranges.
For now, you can either combine them, or filter out the NA values. Below is how to combine them (thus preserving NA
s in the middle of the range):
library(weathercan)
# Download them separately for the whole time range
# (NAs on the ends they will be trimmed, as you saw)
s1 <- weather_dl(station_ids = 202, start = "1975-01-01", end = "2018-12-31",
interval = "day")
s2 <- weather_dl(station_ids = 51319, start = "1975-01-01", end = "2018-12-31",
interval = "day")
# Bind the rows together
s <- rbind(s1, s2)
Then, to make sure there is no overlap between the stations we can check it out visually:
library(ggplot2)
library(dplyr)
# Check the time range
ggplot(data = s, aes(x = date, y = max_temp, colour = factor(station_id))) +
geom_point()
# Check the switch over
ggplot(data = filter(s, date > "2013-04-01", date < "2013-08-01"),
aes(x = date, y = max_temp, colour = factor(station_id))) +
geom_point() +
geom_line()
Does that address your problem?
Howdy @steffilazerte
That'll work for me! This is a specific data grab that I only need to do a limited number of times. A padding TRUE/FALSE argument could be useful on future work (if it's feasible) in case we go for a bigger regional download for Vancouver Island stations. But there's clever solutions at the back-end of the data grab though in case you want to save yourself the headache.
Thanks for the great package!
Glad it worked! I'm going to leave this issue up as a feature request for the padding argument. It wouldn't be difficult to implement.
What I want to do:
I'm trying to download daily weather data for two stations, in this case from the Port Hardy A station name. These two stations don't overlap in ranges. Station 202 goes from 1944 until 2013, while station 51319 picks up from 2013 until today. Basically, I would just like a single time-series of data that accounts for where each station leaves off or picks up.
Issue?
Basically, the download is creating a single data-frame but duplicating the two-time series: one for each station ID. While I am getting the real data from each station (which is what I am asking for), I am also getting missing data for each station outside the range for each station. It appears to duplicate NA's for each date I requested.
I'm not sure whether this behaviour for merging data across stations is intended or not. I could attempt to remove the duplicated dates manually, but I might have to do some quality control on that. Suggestions?
Example:
Here's the stations for Port Hardy. Notice Port Hardy A has two station IDs and two different ranges that don't overlap.
Then I download those two stations:
And we can start to see the problem as we look at the temperature for station 202 at the start and end of the range
Here we see the duplicated NAs for station 202 at the end of the range
I get similar issues for station 5139 at the start and end of the range:
Interestingly, if I download only one station but specify a "bad range", then the data download trims itself to the observation period.
For example:
My Environment