Closed BlaiseKelly closed 1 year ago
Hello Blaise,
I have had a look and the observations are not true duplicates here. For the ozone example, there can be many summaries per date (including hourly means, daily means, and eight-hour means). The summary type is stored in the summary
variable. Below is an example of how to decode the summaries, but generally, hourly means are desired, so a filter can be applied to summary
with the integer key of 1. The NO2 example is the same, but only two different types of summaries are accessible for this pollutant. I hope that helps and it is clear. Enjoy!
Stuart.
# Load packages
library(dplyr)
library(saqgetr)
# Get summary keys
data_summary_keys <- get_saq_summaries()
# Get ozone observations
data_ozone <- get_saq_observations(
site = "gr0027a",
variable = "o3",
start = "2012-07-01",
end = "2012-07-15"
)
# Join the decoded versions of the summary integers
data_ozone_join <- data_ozone %>%
left_join(data_summary_keys, by = "summary") %>%
arrange(date)
# What do we have?
data_ozone_join %>%
distinct(variable,
summary,
averaging_period)
#> # A tibble: 5 × 3
#> variable summary averaging_period
#> <chr> <int> <chr>
#> 1 o3 20 day
#> 2 o3 21 dymax
#> 3 o3 1 hour
#> 4 o3 101 8hour
#> 5 o3 101 hour8
# Usually, hourly observations are desired and they are represented with 1
# Check if we can pivot, a good check for duplicate observations
data_ozone_join %>%
filter(summary == 1L) %>%
select(date,
date_end,
site,
variable,
value) %>%
tidyr::pivot_wider(names_from = variable)
#> # A tibble: 335 × 4
#> date date_end site o3
#> <dttm> <dttm> <chr> <dbl>
#> 1 2012-07-01 00:00:00 NA gr0027a 98
#> 2 2012-07-01 01:00:00 NA gr0027a 98
#> 3 2012-07-01 02:00:00 NA gr0027a 99
#> 4 2012-07-01 03:00:00 NA gr0027a 98
#> 5 2012-07-01 04:00:00 NA gr0027a 97
#> 6 2012-07-01 05:00:00 NA gr0027a 97
#> 7 2012-07-01 06:00:00 NA gr0027a 98
#> 8 2012-07-01 07:00:00 NA gr0027a 100
#> 9 2012-07-01 08:00:00 NA gr0027a 101
#> 10 2012-07-01 09:00:00 NA gr0027a 103
#> # … with 325 more rows
# Filter and use for analysis
data_ozone_hour <- data_ozone %>%
filter(summary == 1L)
Very clear - thanks Stuart!
I was downloading some data for 2012 and noticed there were 4 values for every hour. Each value is different.
dat <- get_saq_observations(site = 'gr0027a', start = '2012-07-01', end = '2012-07-15', variable = 'o3')
Also for this site, but for no2 two values returned. Expecting only one for each hour for both species.
dat_2 <- get_saq_observations(site = 'gb0002r', start = '2012-07-01', end = '2012-07-15', variable = 'no2')