Closed meaghanregina closed 5 years ago
Hi Meaghan,
Thanks for the question and sharing your location data. From what you shared you are looking to extract data for 1012 stations, for 5 parameters, 10 GCMs, 107 days over 79 years. A total of 385,330,005 records! This is definitely a TON of data.
What is fortunate about your locations are that they are pretty dense, therefore I suggest using the areal extraction and NOT point extraction to get what you want. This will allow for more efficient climateR calls and the point values can be extracted as a post-processing step from the returned raster stacks.
To get a bounding area, the AOI package can be called on you points:
library(climateR)
library(sf)
library(raster)
load('./points.RData') # loads your point data called 'samp'
samp
class : SpatialPointsDataFrame
features : 1012
extent : -83.84168, -80.81692, 34.56804, 37.29032 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 11
p = AOI::getBoundingBox(samp) %>% st_as_sf() # convert to bounding box
plot(samp)
plot(p, add = T)
With this extent the below example will get summer-time data for 3 years, 10 GCMs, 5 variables for all of your locations:
# Define GCMs
gcms = c("bcc-csm1-1-m","CanESM2","CNRM-CM5", "CSIRO-Mk3-6-0", "GFDL-ESM2M","HadGEM2-ES365", "HadGEM2-CC365","IPSL-CM5A-MR", "MIROC5","MIROC-ESM-CHEM", "CCSM4")
#Define Parameters
params = c('tmax', 'tmin', 'rhmax', 'rhmin', 'prcp')
# Create a vector of GCM/parameter pairs that match climateR outputs
g.fin = levels(interaction(gcms,params,sep="_"))
# Create an empty list equal in size to the GCM/parameter pairs. We will fill this in a loop:
mylist <- setNames(vector("list", length = NROW(g.fin)), g.fin)
system.time({
for(y in 2020:2022){
yy = getMACA(AOI = p,
model = gcms,
param = params,
startDate = paste0(y, "-05-01"),
endDate = paste0(y, "-08-15")
)
for(i in 1:length(mylist)){
maca.index = which(names(yy) == g.fin[i])
list.index = which(names(mylist) == g.fin[i])
# extract point level values and tranpose matrix so that dates are rows and points are columns
xt = t(raster::extract(yy[[maca.index]], points))
# Convert to data.frame and add date column from rasterStack layer names
r = data.frame(gsub("X", "", gsub("\\.", "-", names(yy[[maca.index]]))), xt, stringsAsFactors = F)
# set column names to the point id in your origional dataset
colnames(r) = c('dates', paste0("X.", samp$pt_ids))
rownames(r) = NULL
# Add (bind) data to the appropiate element of 'mylist'
mylist[[list.index]] <- rbind(mylist[[list.index]], r)
}
message("Finished processing year ", y)
}
})
user system elapsed
75.564 22.670 152.149
Each year takes about ~50 seconds to process and should scale pretty linearly allowing you to process ~ 95,640 values per second, and your whole data request in ~1 - 1.5 hours
The result of this loop is a list of data.frames as shown below:
str(mylist, max.level = 1)
List of 50
$ bcc-csm1-1-m_prcp :'data.frame': 321 obs. of 1013 variables:
$ CanESM2_prcp :'data.frame': 321 obs. of 1013 variables:
$ CNRM-CM5_prcp :'data.frame': 321 obs. of 1013 variables:
$ CSIRO-Mk3-6-0_prcp :'data.frame': 321 obs. of 1013 variables:
$ GFDL-ESM2M_prcp :'data.frame': 321 obs. of 1013 variables:
$ HadGEM2-CC365_prcp :'data.frame': 321 obs. of 1013 variables:
$ HadGEM2-ES365_prcp :'data.frame': 321 obs. of 1013 variables:
$ IPSL-CM5A-MR_prcp :'data.frame': 321 obs. of 1013 variables:
$ MIROC-ESM-CHEM_prcp :'data.frame': 321 obs. of 1013 variables:
$ MIROC5_prcp :'data.frame': 321 obs. of 1013 variables:
$ bcc-csm1-1-m_rhmax :'data.frame': 321 obs. of 1013 variables:
$ CanESM2_rhmax :'data.frame': 321 obs. of 1013 variables:
$ CNRM-CM5_rhmax :'data.frame': 321 obs. of 1013 variables:
$ CSIRO-Mk3-6-0_rhmax :'data.frame': 321 obs. of 1013 variables:
$ GFDL-ESM2M_rhmax :'data.frame': 321 obs. of 1013 variables:
$ HadGEM2-CC365_rhmax :'data.frame': 321 obs. of 1013 variables:
$ HadGEM2-ES365_rhmax :'data.frame': 321 obs. of 1013 variables:
...
Each internal data.frame will have a column for the date and values for each point:
str(mylist$CanESM2_prcp, max.level = 2)
'data.frame': 321 obs. of 1013 variables:
$ dates: chr "2020-05-01" "2020-05-02" "2020-05-03" "2020-05-04" ...
$ X.1 : num 5.72 0 0 13.28 1.47 ...
$ X.2 : num 5.76 0 0 13.85 1.82 ...
$ X.3 : num 6.04 0 0 13.98 1.99 ...
$ X.4 : num 6.12 0 0 13.88 1.9 ...
$ X.5 : num 7.16 0 0 14.28 1.75 ...
$ X.6 : num 6.21 0 0 13.57 1.82 ...
$ X.7 : num 6.51 0 0 13.94 2.03 ...
...
From there you can plot the data....
par(mfrow = c(3,1))
plot(x = as.Date(mylist$`bcc-csm1-1-m_prcp`$dates),
y = mylist$`bcc-csm1-1-m_prcp`$X.5, pch = 16, col = 'blue',
main = 'Location 5 PRCP', xlab = "Dates", ylab = "PRCP (mm)")
plot(x = as.Date(mylist$`bcc-csm1-1-m_prcp`$dates),
y = mylist$`bcc-csm1-1-m_tmax`$X.5, pch = 16, col = 'red',
main = 'Location 5 TMAX', xlab = "Dates", ylab = "TMAX (K)" )
plot(x = as.Date(mylist$`bcc-csm1-1-m_prcp`$dates),
y = mylist$`bcc-csm1-1-m_rhmax`$X.5, pch = 16, col = 'purple',
main = 'Location 5 RHMAX', xlab = "Dates", ylab = "RHMAX")
... or write each list element to a CSV...
out_path = "./maca_climate_files/"
for(i in 1:length(mylist)){
write.csv(mylist[[i]], file = paste0(out_path, names(mylist)[i], ".csv"))
}
Hope that helps, and thanks for pushing these functions to the limit!
Mike
Hi Mike, Great package! I am hoping to use your new update to extract a mountain of point data. My ultimate goal is to extract daily summertime data (from May 1 - August 15) for the years 2020 - 2099 from the MACA climate data set. I would like to extract data from 10 GCMs and 5 variables to 1000 points. I know this is a TON of data, and was hoping you might have a solution. Thanks so much!