Closed rajibshibly closed 5 years ago
Strange that some of this 'missing' data is present if you get at it via the WSC historic data portal.
For instance, 02YA002 is not missing data on the website for these months
I don't have hydat on this machine but can investigate further soon.
In the mean time, @rajibshibly can you post a reproducible example (code snippet) for getting the table you uploaded?
At first I tried the hy_monthly_flows function for e.g.:
monthly_flows_03OC003 <- hy_monthly_flows(station_number ="03OC003", start_date = "1998-10-01", end_date = "1998-12-31")
I ended up with the error: Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "function"
Please find the code below:
#Function which is oppostive of %in%
'%!in%' <- function(x,y)!('%in%'(x,y))
#NL_stns stores all hydat stations which are active and from the province of NL
NL_Stns_Active <- hy_stations() %>%
filter(HYD_STATUS == "ACTIVE") %>%
filter(PROV_TERR_STATE_LOC == "NL")
#All stations which are regulated
All_Stns_Reg <- hy_stn_regulation()%>%
filter(REGULATED == "TRUE")
#Filtering all NL regulated stations using %in% keyword
NL_Stns_Reg <- NL_Stns_Active[NL_Stns_Active$STATION_NUMBER %in% All_Stns_Reg$STATION_NUMBER,]
#Filtering all active and unregulated stations using the function %!in%
NL_Stns_Act_Unreg <- NL_Stns_Active[NL_Stns_Active$STATION_NUMBER %!in% NL_Stns_Reg$STATION_NUMBER,]
#Retrieve date range and number of years of data for active and unregulated flow stations
NL_Stns_Range <- hy_stn_data_range(station_number = NL_Stns_Act_Unreg$STATION_NUMBER)%>%
filter(DATA_TYPE == "Q")
#Merge the Active and Unregulated station with the date range
NL_Stns <- merge(NL_Stns_Act_Unreg, NL_Stns_Range, by.x= "STATION_NUMBER")
#Those station which does not have elevation information
Left_Out_Stns <- NL_Stns[NL_Stns$STATION_NUMBER %!in% NL_Flow$STATION_NUMBER,]
#Find the row number for Rattling Brook Below Plant Discharge
rowNum <- which(NL_Stns$STATION_NUMBER == "02ZK008")
#Add the gross drainage area for Rattling Brook Below Plant Discharge
NL_Stns$DRAINAGE_AREA_GROSS[rowNum] = 36.082
#Select certain fields for the NL_Stns
NL_Stns <- select(NL_Stns, "STATION_NUMBER", "STATION_NAME", "PROV_TERR_STATE_LOC", "LATITUDE", "LONGITUDE", "DRAINAGE_AREA_GROSS", "Year_from", "Year_to", "RECORD_LENGTH")
#Seperate between NF and Labrador Stations
for (row in 1:nrow(NL_Stns)){
if (substr(NL_Stns$STATION_NUMBER[row], 1, 3) == "02X"){NL_Stns$PROV_TERR_STATE_LOC[row]="LAB"
}else if (substr(NL_Stns$STATION_NUMBER[row], 1, 2) == "03"){NL_Stns$PROV_TERR_STATE_LOC[row]="LAB"
}else{NL_Stns$PROV_TERR_STATE_LOC[row]="NF"}
}
#Only keep rows which have RECORD_LENGTH of 17 years or greater.
NL_Stns <- filter (NL_Stns, RECORD_LENGTH >= 17)
NL_Stns_daily_flows <- hy_daily_flows(NL_Stns$STATION_NUMBER, start_date = "1996-01-01", end_date = "2015-12-31")
#Removing the NA values i.e. blank values
NL_Stns_daily_flows <- drop_na(NL_Stns_daily_flows)
NL_Stns_daily_flows$Month <- Month(NL_Stns_daily_flows$Date)
NL_Stns_daily_flows$Year <- Year(NL_Stns_daily_flows$Date)
# Aggregate on months and year and get monthly mean temperature
Mean_Monthly_Flow_NL <- NL_Stns_daily_flows %>%
#group_by groups the data by month and year
group_by(STATION_NUMBER, Year, Month) %>%
mutate(grouped_id = row_number()) %>%
#summarise computes the mean monthly temperature
summarise(mean_flow = mean(Value, na.rm = TRUE))
#Mean_Monthly_Flow_NL <- select(Mean_Monthly_Flow_NL, "STATION_NUMBER", "mean_flow", "Year")
#Mean_Monthly_Flow_NL_Count <- count(Mean_Monthly_Flow_NL, "Year")
#Put the data into wide format
Mean_Monthly_Flow_Wide <- spread(Mean_Monthly_Flow_NL, STATION_NUMBER, mean_flow)
@rajibshibly Looks to be an issue with however you munged the data. If I grab monthly means specifically for 02YA002 the issue doesn't exist...
library(tidyhydat)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
bartletts_mean <- hy_monthly_flows(
station_number = "02YA002"
) %>%
filter(
Sum_stat == "MEAN"
)
#> No start and end dates specified. All dates available will be returned.
#> All station successfully retrieved
sum(is.na(bartletts_mean$Value))
#> [1] 0
bartletts_mean %>%
filter(Year == 1996)
#> # A tibble: 12 x 8
#> STATION_NUMBER Year Month Full_Month No_days Sum_stat Value
#> <chr> <int> <int> <lgl> <int> <chr> <dbl>
#> 1 02YA002 1996 1 TRUE 31 MEAN 0.409
#> 2 02YA002 1996 2 TRUE 29 MEAN 0.258
#> 3 02YA002 1996 3 TRUE 31 MEAN 0.546
#> 4 02YA002 1996 4 TRUE 30 MEAN 1.74
#> 5 02YA002 1996 5 TRUE 31 MEAN 4.29
#> 6 02YA002 1996 6 TRUE 30 MEAN 1.88
#> 7 02YA002 1996 7 TRUE 31 MEAN 2.17
#> 8 02YA002 1996 8 TRUE 31 MEAN 0.438
#> 9 02YA002 1996 9 TRUE 30 MEAN 0.504
#> 10 02YA002 1996 10 TRUE 31 MEAN 1.64
#> 11 02YA002 1996 11 TRUE 30 MEAN 1.80
#> 12 02YA002 1996 12 TRUE 31 MEAN 1.91
#> # ... with 1 more variable: Date_occurred <date>
Created on 2019-01-15 by the reprex package (v0.2.1)
Not sure what munge means but looks like this works! ... I shall give this a try...
Thanks
This is not a bug but a general issue and I was wondering if anyone have any thought or idea about how to deal with it.
When retrieving monthly flow values for some of the Newfoundland and Labrador stations, I found many stations consist gaps in monthly flow values. An example is attached. I was wondering if anyone can forward me an R function that is appropriate for filling time series monthly missing values.
Thanks in advance.