mikejohnson51 / climateR

An R 📦 for getting point and gridded climate data by AOI
https://mikejohnson51.github.io/climateR/
MIT License
168 stars 40 forks source link

Extracting daily (summertime) data from 1000 points over 10 GCMs #6

Closed meaghanregina closed 5 years ago

meaghanregina commented 5 years ago

Hi Mike, Great package! I am hoping to use your new update to extract a mountain of point data. My ultimate goal is to extract daily summertime data (from May 1 - August 15) for the years 2020 - 2099 from the MACA climate data set. I would like to extract data from 10 GCMs and 5 variables to 1000 points. I know this is a TON of data, and was hoping you might have a solution. Thanks so much!

mikejohnson51 commented 5 years ago

Hi Meaghan,

Thanks for the question and sharing your location data. From what you shared you are looking to extract data for 1012 stations, for 5 parameters, 10 GCMs, 107 days over 79 years. A total of 385,330,005 records! This is definitely a TON of data.

What is fortunate about your locations are that they are pretty dense, therefore I suggest using the areal extraction and NOT point extraction to get what you want. This will allow for more efficient climateR calls and the point values can be extracted as a post-processing step from the returned raster stacks.

To get a bounding area, the AOI package can be called on you points:

library(climateR)
library(sf)
library(raster)

load('./points.RData') # loads your point data called 'samp'

samp

class       : SpatialPointsDataFrame 
features    : 1012 
extent      : -83.84168, -80.81692, 34.56804, 37.29032  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
variables   : 11

p = AOI::getBoundingBox(samp) %>% st_as_sf() # convert to bounding box

plot(samp)
plot(p, add = T)

image

With this extent the below example will get summer-time data for 3 years, 10 GCMs, 5 variables for all of your locations:

# Define GCMs
gcms = c("bcc-csm1-1-m","CanESM2","CNRM-CM5", "CSIRO-Mk3-6-0", "GFDL-ESM2M","HadGEM2-ES365", "HadGEM2-CC365","IPSL-CM5A-MR", "MIROC5","MIROC-ESM-CHEM", "CCSM4")

#Define Parameters
params = c('tmax', 'tmin', 'rhmax', 'rhmin', 'prcp')

# Create a vector of GCM/parameter pairs that match climateR outputs
g.fin = levels(interaction(gcms,params,sep="_"))

# Create an empty list equal in size to the GCM/parameter pairs. We will fill this in a loop:
mylist <- setNames(vector("list", length = NROW(g.fin)), g.fin)

system.time({

for(y in 2020:2022){  
  yy = getMACA(AOI = p,
          model = gcms,
          param = params,
          startDate = paste0(y, "-05-01"),
          endDate   = paste0(y, "-08-15")
  )

    for(i in 1:length(mylist)){
      maca.index =  which(names(yy) == g.fin[i])
      list.index =  which(names(mylist) == g.fin[i])

      # extract point level values and tranpose matrix so that dates are rows and points are columns
      xt = t(raster::extract(yy[[maca.index]], points)) 

      # Convert to data.frame and add date column from rasterStack layer names
      r = data.frame(gsub("X", "", gsub("\\.", "-", names(yy[[maca.index]]))), xt, stringsAsFactors = F)

      # set column names to the point id in your origional dataset
      colnames(r) = c('dates', paste0("X.", samp$pt_ids))
      rownames(r) = NULL

      # Add (bind) data to the appropiate element of 'mylist'
      mylist[[list.index]]  <- rbind(mylist[[list.index]], r)
    }

  message("Finished processing year ", y)

}
})

   user  system elapsed 
 75.564  22.670 152.149 

Each year takes about ~50 seconds to process and should scale pretty linearly allowing you to process ~ 95,640 values per second, and your whole data request in ~1 - 1.5 hours

The result of this loop is a list of data.frames as shown below:

str(mylist, max.level = 1)
List of 50
 $ bcc-csm1-1-m_prcp   :'data.frame':   321 obs. of  1013 variables:
 $ CanESM2_prcp        :'data.frame':   321 obs. of  1013 variables:
 $ CNRM-CM5_prcp       :'data.frame':   321 obs. of  1013 variables:
 $ CSIRO-Mk3-6-0_prcp  :'data.frame':   321 obs. of  1013 variables:
 $ GFDL-ESM2M_prcp     :'data.frame':   321 obs. of  1013 variables:
 $ HadGEM2-CC365_prcp  :'data.frame':   321 obs. of  1013 variables:
 $ HadGEM2-ES365_prcp  :'data.frame':   321 obs. of  1013 variables:
 $ IPSL-CM5A-MR_prcp   :'data.frame':   321 obs. of  1013 variables:
 $ MIROC-ESM-CHEM_prcp :'data.frame':   321 obs. of  1013 variables:
 $ MIROC5_prcp         :'data.frame':   321 obs. of  1013 variables:
 $ bcc-csm1-1-m_rhmax  :'data.frame':   321 obs. of  1013 variables:
 $ CanESM2_rhmax       :'data.frame':   321 obs. of  1013 variables:
 $ CNRM-CM5_rhmax      :'data.frame':   321 obs. of  1013 variables:
 $ CSIRO-Mk3-6-0_rhmax :'data.frame':   321 obs. of  1013 variables:
 $ GFDL-ESM2M_rhmax    :'data.frame':   321 obs. of  1013 variables:
 $ HadGEM2-CC365_rhmax :'data.frame':   321 obs. of  1013 variables:
 $ HadGEM2-ES365_rhmax :'data.frame':   321 obs. of  1013 variables:

...

Each internal data.frame will have a column for the date and values for each point:

str(mylist$CanESM2_prcp, max.level = 2)
'data.frame':   321 obs. of  1013 variables:
 $ dates: chr  "2020-05-01" "2020-05-02" "2020-05-03" "2020-05-04" ...
 $ X.1  : num  5.72 0 0 13.28 1.47 ...
 $ X.2  : num  5.76 0 0 13.85 1.82 ...
 $ X.3  : num  6.04 0 0 13.98 1.99 ...
 $ X.4  : num  6.12 0 0 13.88 1.9 ...
 $ X.5  : num  7.16 0 0 14.28 1.75 ...
 $ X.6  : num  6.21 0 0 13.57 1.82 ...
 $ X.7  : num  6.51 0 0 13.94 2.03 ...

...

From there you can plot the data....


par(mfrow = c(3,1))
plot(x = as.Date(mylist$`bcc-csm1-1-m_prcp`$dates), 
     y = mylist$`bcc-csm1-1-m_prcp`$X.5, pch = 16, col = 'blue',
     main = 'Location 5 PRCP', xlab = "Dates", ylab = "PRCP (mm)")
plot(x = as.Date(mylist$`bcc-csm1-1-m_prcp`$dates), 
     y = mylist$`bcc-csm1-1-m_tmax`$X.5, pch = 16, col = 'red',
     main = 'Location 5 TMAX', xlab = "Dates", ylab = "TMAX (K)" )
plot(x = as.Date(mylist$`bcc-csm1-1-m_prcp`$dates), 
     y = mylist$`bcc-csm1-1-m_rhmax`$X.5, pch = 16, col = 'purple',
     main = 'Location 5 RHMAX', xlab = "Dates", ylab = "RHMAX")

image

... or write each list element to a CSV...

out_path = "./maca_climate_files/"

for(i in 1:length(mylist)){
  write.csv(mylist[[i]], file = paste0(out_path, names(mylist)[i], ".csv"))
}

Hope that helps, and thanks for pushing these functions to the limit!

Mike