rgdal-dev / rasterwise

Hard-won lessons! Don't lose 'em!
4 stars 0 forks source link

How to manipulate CorTAD layers via raster and nco commands in R #15

Open johnfbruno opened 8 years ago

johnfbruno commented 8 years ago

The data file is the CorTAD (Coral Reef Temperature Anommoly Database) available here as a 67 GB file, metadata are here Im am using v5, which unlike previous versions, packs all the tiles into a single, massive nc file The dimensions in the file include lat, long, time and multiple variables (which are metrics of thermal stress on coral reefs) The values are based on satellite measured Sea Surface Temperature values We typically use ArcGIS to handle the CorTAD data, but I really want to learn to do it with the raster function in R. Then I will include this code in the metadata (I am a codeveloper of the database) to make it much easier for others to download and use the data (and I won’t have to find a PC w Arc loaded every time I need to do this). And I’d also like to make available a single raster layer with the summed metric for the entire period (99% of what NOAA provides in their nc file is not necessary to most users, e.g., weekly values for a bunch of useless metrics).

The variable I want is TSA_Frequency [lon,lat,time] = Frequency of Thermal Stress Anomalies >= 1 deg C over previous 52 weeks. For the time period 1982-2010. (note the database runs through 2012, but we need to exclude 2011 and on for the project I’m working on now).

Because TSA_Frequency calculates a running summation of thermal stress for the 52 weeks, to sum across the entire period of interest (1982-2010) we want to grab the TSA_Frequency value for the final week of every year. I believe I either I need to create a separate layer for week 52 of each year, or if possible, sum across week 52 for the 1982-2010 time period using an nco function (that would be elegant and easy if possible).

I’ve got the file open via raster, and am able to do some basic sorting using the brick command, etc. e.g., f <- "cortadv5_TSA.nc" raster(f) print(raster(f)) raster(f, varname = "TSA_DHW") b <- brick(f, lvar=4) print(raster(f)) NAvalue(b) <- 9e+20 plot(b)

I just can’t figure out exactly which nco command to use and how to set it up to get what I want out of it.

In case anybody is interested, the goal is to build a single layer raster file of global TSA_frequency values, to be combined with coordinates of reef sites for which we have biological data (e.g., coral cover) so thermal stress can be included in analyses of predictors of coral loss, e.g., http://www.nature.com/articles/srep29778 and http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2486.2012.02658.x/abstract

Thanks so much to anybody who can provide some guidance. A lot of peeps have trouble working with these SST files that are publicly available but hard to manipulate unless you really know what your doing. I’m hoping this discussion could be a resource for our community.

Cheers, JB

John Francis Bruno Professor, Dept of Biology UNC Chapel Hill www.johnfbruno.com

mdsumner commented 8 years ago

This file is in pretty good shape - it's a regular longlat-grid, and raster picks up the spatial and temporal metadata properly.

I've left out my workings to figure out what to do here, but you should where this is going. Let me know if I've not done the right task,

## this is just my local path to the file on our system
fp <- file.path(getOption("default.datadir"))
f <- file.path(fp, "data_local", "ftp.nodc.noaa.gov/pub/data.nodc/cortad/Version5/cortadv5_TSA.nc")

library(raster)
## use raster() to learn what's in there
##TSA, TSA_Frequency, TSA_DHW
##  raster(f)

## the metadata for space and time is already in good shape
b <- brick(f, varname = "TSA_DHW")           
time <- getZ(b)

range(time)  ## note this is (Date not POSIXct)
#[1] "1982-01-05" "2012-12-25"

We need a way to say "last week of the year, but since we going in chunks of 7 days the end point cycles around, so do we want the nearest date to January 1 (or?) - there's many ways to get that, even if we pick them out manually.

## find the nearest date to 31 Dec (or some other specific
## rule for "nearest week")
##format(time, "%j")[seq(1, length(time), by = (365.25/7))]

asub <- seq(52, length(time), by = (365.25/7))

## check what dates these would be
#time[asub]
# [1] "1982-12-28" "1983-12-27" "1984-12-25" "1985-12-24" "1986-12-23"
# [6] "1987-12-22" "1988-12-27" "1989-12-26" "1990-12-25" "1991-12-24"
# [11] "1992-12-22" "1993-12-21" "1994-12-27" "1995-12-26" "1996-12-24"
# [16] "1997-12-23" "1998-12-22" "1999-12-28" "2000-12-26" "2001-12-25"
# [21] "2002-12-24" "2003-12-23" "2004-12-21" "2005-12-27" "2006-12-26"
# [26] "2007-12-25" "2008-12-23" "2009-12-22" "2010-12-28" "2011-12-27"
## and drop the last one
asub <- head(asub, -1)

Now subset the raster with that index and calc the sum.

Since the subset is pretty large we can just loop over, each single layer is large but not too much to have two of these in memory. Though, I did have problems with it incrementing so I converted to raw matrix to ensure I knew what was happening.

bsum <- values(b[[1]])
for (i in 2:length(asub)) {
  bsum <- bsum + values(b[[asub[i]]])
}

## then finally return to raster format
bsum_result <- setValues(raster(b[[1]]), bsum)

If there were missing values, different in each layer we'd need to keep a running mask and sum/count perhaps.

This took <10m to run, though after about 20min of trying a few things.

This is what I see, is this looking kind of sensible?

library(maptools)
data(wrld_simpl)
library(viridis)
plot(bsum_result, col = viridis(12)); plot(wrld_simpl, add = TRUE)

## results in a 30Mb tiff down from a raw 250Mb
## could create a compressed netcdf4 but I need to explore that
##writeRaster(bsum_result, "bsum_result.tif", options = c("COMPRESS=DEFLATE", "TILED=YES"))

(Would need to get some sensible plotting scheme since it does take time to plot, it uses a certain maxpixels to do this).

bsum_result

johnfbruno commented 8 years ago

Thank you! It seemed to all work smoothly. However, when I extract values for a set of coordinates, most produce NAs. I tried with three different coordinate sets and cannot figure out why. The plot of the bsum_result looks complete. Did you by chance try to extract for some coordinates? Ill paste my code below FYI.

setwd("/Volumes/Orange BU/TSA analysis")
library(raster)
f <- "cortadv5_TSA.nc"
raster(f)
print(raster(f))

Also, why does the year 2012 seem to get dropped? The last time period is "2011-12-27"

b <- brick(f, varname = "TSA_Frequency")           
time <- getZ(b)
range(time)  ## note this is (Date not POSIXct)
asub <- seq(52, length(time), by = (365.25/7))
time[asub]

asub <- head(asub, -1)

bsum <- values(b[[1]])
for (i in 2:length(asub)) {bsum <- bsum + values(b[[asub[i]]])}

bsum_result <- setValues(raster(b[[1]]), bsum)

library(map tools)
plot(bsum_result)

## extrat summed TSA_Frequency values "bsum_result" for a set of coordinates
# open coordinates file
caribC <- read.csv("~/XXXX/carib+human2c.csv")
coordinates(caribC)<-~Lat+Lon
plot(caribC)

#extract the values from the TSA file
TSA_Frequency2c<-extract(bsum_result, caribC)
head(TSA_Frequency2c)
hist(TSA_Frequency2c, breaks=30)

#add values to csv
caribC$TSA_Frequency2c<-extract(bsum_result, caribC)
#check for new column
names(caribC)
#save to csv
write.csv(caribC, "caribCwTSA_freq.csv")
mdsumner commented 8 years ago

sp convention is for Lon, then Lat -did you plot the points and the grid together?

On Sun, Aug 28, 2016, 02:28 John Bruno notifications@github.com wrote:

Thank you! It seemed to all work smoothly. However, when I extract values for a set of coordinates, most produce NAs. I tried with three different coordinate sets and cannot figure out why. The plot of the bsum_result looks complete. Did you by chance try to extract for some coordinates? Ill paste my code below FYI.

setwd("/Volumes/Orange BU/TSA analysis") library(raster) f <- "cortadv5_TSA.nc" raster(f) print(raster(f))

Also, why does the year 2012 seem to get dropped? The last time period is "2011-12-27"

b <- brick(f, varname = "TSA_Frequency") time <- getZ(b) range(time) ## note this is (Date not POSIXct) asub <- seq(52, length(time), by = (365.25/7)) time[asub]

asub <- head(asub, -1)

bsum <- values(b[[1]]) for (i in 2:length(asub)) {bsum <- bsum + values(b[[asub[i]]])}

bsum_result <- setValues(raster(b[[1]]), bsum)

library(map tools) plot(bsum_result)

extrat summed TSA_Frequency values "bsum_result" for a set of coordinates

open coordinates file

caribC <- read.csv("~/XXXX/carib+human2c.csv") coordinates(caribC)<-~Lat+Lon plot(caribC)

extract the values from the TSA file

TSA_Frequency2c<-extract(bsum_result, caribC) head(TSA_Frequency2c) hist(TSA_Frequency2c, breaks=30)

add values to csv

caribC$TSA_Frequency2c<-extract(bsum_result, caribC)

check for new column

names(caribC)

save to csv

write.csv(caribC, "caribCwTSA_freq.csv")

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/mdsumner/ranch/issues/1#issuecomment-242926770, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6tb6RgWEMyEmCG7V4n7YwHvJcqNs3sks5qkGWtgaJpZM4JpGRy .

Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia