ropensci / rerddap

R client for working with ERDDAP servers
https://docs.ropensci.org/rerddap
Other
40 stars 14 forks source link

"melting" of data assumes lat-lon grid #74

Closed rmendels closed 6 years ago

rmendels commented 6 years ago

In griddap(), when the data are "melted" from the netcdf to be in "long" form, it is assumed that the data are on a lat-lon grid. This is not always the case, look at:

rerddap::info('glos_tds_5912_ca66_3f41')

The problem code is in the function ncdf4_get() in the file ncdf_helpers.R, lines 25-32:

  } else {
    time <- suppressWarnings(rep(out$time, each = rows/length(out$time)))
    lat <- rep(rep(out$latitude, each = length(out$longitude)),
               length(out$time))
    lon <- rep(rep(out$longitude, times = length(out$latitude)),
               times = length(out$time))
    meta <- data.frame(time, lat, lon, stringsAsFactors = FALSE)
  }
sckott commented 6 years ago

thanks for this @rmendels

sckott commented 6 years ago

@rmendels Sorry for delay, was on a long vacation.

Looking at this, and wondering how many other options are there other than lat and long? Do we want to support non-lat/lon based data? Ideally we'd want to generalize the parsing of data to handle any data that an nc file may have.

rmendels commented 6 years ago

If you use the rerddap option to just get the data to a file, you will find that rerddap works just fine for datasets that do not have lat-lon. You can try it on the dataset referenced above:

rerddap::info('glos_tds_5912_ca66_3f41')

and in fact in rerddapXtracto in one of the examples I access a dataset not on lat-lon (which of course uses rerddap) and it works quite well. It strikes me there are a number of options, in an increasing amount of work for you:

  1. Leave code as is, spend a little more time in docs warning people about data not on lat-lon, give a more detailed example of how to just get a file and read in with ncdf4 or raster or RNetcdf.

  2. Add a test for whether the coordinates are lat-lon, if not don't melt the data and just leave the file. Add examples as in 1).

  3. Add a test for whether the coordinates are lat-lon, if they are, "melt" the data, if they are not, just read the data in as ncdf4 does, and create the structure with the data in that form (that is as arrays) rather than "melted".

  4. Do a general "melt". "Melting" just means expanding out multi-dimensional arrays along their coordinates, and what will be true in almost all ERDDAP gridded datasets is that the coordinate ordering as given by ERDDAP will be the C-type row order, so it will be slowest to fastest (for example time, altitude, lat, lon), or for R, as done by ncdf4 it will be the revise order to be column-ordered. But this would probably take some care and some time to program correctly, and as I have said before it is easy for me to say because I don't have to do it.

It strikes me option 3) gives the best trade-off between generality and amount of work. In your code, you already have the dimension names, so testing for lat-lon should simple. And you already have code to read in from a netcdf file, so the only thing would be whether you append a dataframe of the melted data to the top level information you already use, or a list (or dataframe) of the coordinate arrays and data array, all of which you already read in.

sckott commented 6 years ago

3 does sound like a good approach, i'll try that and report back

sckott commented 6 years ago

@rmendels i've pushed a change - we now detect lat/lon in the dimensions, and if not there we don't read the data in and throw a warning so the user knows about it.

rmendels commented 6 years ago

Are you far enough along in the changes that I should start testing?

sckott commented 6 years ago

yes

rmendels commented 6 years ago

I am working through the vignette, which I know you have done, but it is a good test of the system and that it all works on a machine that is not yours, an times that can cause problems. While everything has been working, off and on I get a message like this:

hydroInfo <- info('siocalcofiHydroCasts') Warning message: In preDraw(x) : reached elapsed time limit

Any idea what that is about?

Also looks like you are using crul not httr, just for myself curious as to the advantages? And I had to install vcr to get rerddap to load, again just curious what you are using it for in this package.

rmendels commented 6 years ago

I am not very good with git, and this is a small change, hope you can implement it from this, rather than do a pull request, because they always mess up my own setup as I lose track of what is going where and how. The last example in the vignette does not work anymore, it appears they have changed the database structure. The following can replace all the references to "coho" and I just ran the code below and it worked correctly:

urlBase <- 'http://oceanview.pfeg.noaa.gov/erddap/'
chinookInfo <- info('cciea_SM_CA_CH_ABND', url = urlBase)
chinookData <- tabledap(chinookInfo, fields = c("abundance_anomaly", "time"),  'population="California Coast"', url = urlBase)
chinookData$abundance_anomaly <- as.numeric(chinookData$abundance_anomaly)
alldata <- data.frame(chinook = chinookData$abundance_anomaly[1:23], maxSLP = nphData$maxSLP[6:28], year = nphData$year[6:28])

ggplot(alldata) + geom_line(aes(x = year, y = chinook), colour = 'blue') + theme_bw() + ggtitle("chinook abundance anomaly)")
rmendels commented 6 years ago

Forget about the change in text. The one line that says:

The over 300 indices developed for the CCSIEA are available through rerddap. Here an index of coho abundance in California

Should become:

The over 300 indices developed for the CCSIEA are available through rerddap. Here an index of chinook abundance in California