Open kdorheim opened 4 years ago
[x] Plot the new netcdf in Panoply, based on what you know about fldmean was this what you were expecting?
In R set up a script (or if you want a mark down document to answer the following)
summary
to check out summary stats. Up next, we will start playing around with executing cdo via the R system2
call
Problem CDO doesn't seem to be able to open netCDF-4 files. The files that I got to open were CF-convention files. I read online that netCDF-4 files are unzipped by CDO. I am wondering if that isn't happening for some reason?
NetCDF 4 is ten years old and I'm pretty sure CDO should handle it fine. Can you provide a reproducible example?
@skygering Was the issue just with the example data I had you pull from fldgen
? Does it work this data? (I think you are going to have to unzip it first)
tas_Amon_NorESM2-LM_1pctCO2_r1i1p1f1_gn_000101-001012.nc.zip
Answers to above questions:
The fldmean cdo operator calculates the mean value of all of the grid cells. It uses the area weighted mean for each grid cell in a field.
cdo fldmean tas_annual_ipsl-cm5a-lr_rcp8p5_xxx_2006-2099.nc tas_annual_mean.nc -The above command created a new file where each timestep has a mean surface temperature. Based on what fldmean does, this is the average temperature of the entire land area. There are no lat/lon variables anymore in the new file, just time step and average temperature. Here is the graph I got! The world is heating up as expected...
I started by storing the data in a variable f
. When I ran print(f)
, I got a detailed list of the variables and dimensions. The variable mean global temperature has the variables latitude, longitude, and time. There were 720 longitude values, 360 latitude values, and 94 time variables. This is equal to 24,364,800 individual points of data for the global mean temperature from 2006-2099. Longitude has units degrees east, latitude has units degrees north, time has units %Y%m%d as a long string, and tas has units Kelvin. There is also another variable time_bnds which seems to be all of the time units in list form. The file is a object of class ncdf4 and the other variables are all arrays.
I found the global annual temperature using apply since this let me split the 3D array into 'slices' with time and then average over each slice. I typed mean_tas <- apply(tas, 3, function(x) mean(x, na.rm = TRUE))
I don't know how to use lapply since it needs a list. This suggests to me that the data needs to be modified before using, maybe with the function split(), but I wasn't sure. While this data looks really similar in shape to the previous data it isn't exactly the same. The maximum temperature is a lower on the map made in R The data on R isn't weighted by a land map like the CDO function is. For this I think I need a land map.
-I am also struggling with using gglot to plot without using a pipeline because I need to input a data frame
but I don't quite understand what that is. I just want to put in the x and y values, but clearly ggplot2 needs a bit more fineness. Since apply returns just a list, and it seems like I need a more complicated data structure for ggplot2, I am not sure what to do.
I will ask Ben questions tomorrow morning at our meeting!
@skygering great work!
You are right apply
is the best function to work with here, latter on though we will be using lapply
because it will let us use a single function to process data from lots of different models. BTW you used apply
perfectly! I hope that didn't take you too long to figure out (I had intentionally left out some info in hopes that you'd come ask for help but looks like you were able to figure it out).
As for getting the data into a data frame, you can make a data frame out of vectors!
So something like this.
# Calculate the mean tas
mean_tas <- apply(data, 3, function(x) mean(x, na.rm = TRUE))
# Extract the time information
time <- ncvar_get(nc, 'time')
# Format the time and temp vectors into a data.frame
df <- data.frame(time = time,
value = mean_tas)
Something that might be helpful to work on would be as @bpbond mentioned to set up reproducible samples of code. This can help debugging. Also for the scripts that you write related to these learning activities let's save them to this repo in a directory called scratch
. To give you some practice with working with git, pull request, and reviews. As always I am happy to talk about any questions you may have and @bpbond is a great person to talk this over with as well.
I finished everything and made the directory in my repo! Here are the graphs for the weighted (from CDO) and unweighted (from R) data.
Even though it was clear from the graphs that the two data sets are not the same, I still used identical()
and all.equal()
to check to make sure I knew how to use them. All of my work from this exercise is in the tas_cdo.R
file!
I do have a github questions. When I made the scratch directory, I made a branch and then cloned the branch onto my local desktop to put my files into it. I now realize that the github URL is the same no matter which branch, so when I pushed my files up it went into the master branch rather than my new branch. While that doesn't really matter in this case, I was wondering if there is a way to clone or push to a specific branch?
Ready for the next part!
👏 nice work @skygering - love the graphs
git clone
clones an entire repository, including all branches (which may have different remotes, though I've never done this)
Pushing is by definition branch-specific.
Background Notes
CDO
Before we get to the tasks here is a bit of information about the different ways one can use cdo.
system2
function (system2 documentation)I prefer execute the cdo commands in R, in my opinion this set up is more reproducible, allows for defensive programing, diagnostic tests, and validity checks. If you are more comfortable with setting up a bash script we can talk about doing that but for now I would like us to focus on understanding cdo from command line and in R.
cdo uses the following syntax
cdo operator_name in_file_nc out_file_nc
whereoperator_name
is the cdo operator name, there are a boat load of these and they are all listed in the documentation manual, the one we will be working with the most is calledfldmean
.in_file_nc
is the path to the netcdf or nc file to processout_file_nc
is the nc file that will be generated.Netcdfs and R
Hints for working with netcdfs in R
ncdf4
package, the functionnc_open
,ncvar_get
, andncatt_get
are usefulstr
,dim
,length
, andhead
to get an idea of what the data looks like without printing out the whole thingThe extracted netcdf data is organized in lists, the
apply
family of functions (cough coughlapply
) make applying a function to a list really easy.(Please come to me with questions about any of these functions, stack over flow has lots of helpful information. Documentation for R functions are available online and in R use
help("function_name")
for R documentation.)Plotting in R
Most intro stats class use base R plots to visualize the results. We will be using ggplot2, the grammar rules can be funky at first, let me know if you have questions, Steph is also a good ggplot resource (she is a data viz wizard). FYI ggplot syntax works best with long formatted data.