Closed dblodgett-usgs closed 2 years ago
thanks! PR #43
Still getting the issue since some are size 3 and others size 2.
does that mean dimids and chunksizes are not related? or is there another > length (1) column?
I'll ust make list cols from each perhaps
I do need to revisit this properly, a sensible schema worked out - same could be done for gdal too and perhaps some logic reused
I honestly don't know what the filter_params
an filter_id
are, but those are what's causing the issue. The dimids and chunksizes should be the same size.
ok ! thanks, I thought I had it will look more closely 🙏
Hi all, the dimids
and chunksizes
should always be the same length. The filter_id
and filter_params
should also be the same length, but list members of filter_params
can have different vector lengths (depending on the argument list used by each filter routine). The lengths of filter_id
and dimids
are not related to each other.
Correction - chunksizes
can be NULL
if a variable uses contiguous storage (i.e. is not chunked).
Here are the relevant definitions from the RNetCDF help on var.inq.nc
:
dimids
: Vector of dimension IDs corresponding to the variable dimensions (NA
for scalar variables). Order is leftmost varying fastest.chunksizes
: (netcdf4
) Chunk size expressed as the number of elements along each dimension, in the same order as dimids
. NULL
implies contiguous storage.filter_id
: (netcdf4
) Vector of filter IDs associated with the variable, or NULL
if the NetCDF library does not support the multi-filter interface.netcdf4
) List with one element per filter_id
, or NULL
if the NetCDF library does not support the multi-filter interface. Each list member is a vector of numeric
parameters for the corresponding filter. Please see the NetCDF documentation for information about the available filters and their parameters.@mjwoods does it mean that the data are corrupt or that there is a issue in the RNetCDF
library? I've the same issue with data from the german weather service (DWD), see
https://stackoverflow.com/questions/73307392/reading-netcdf-files-tibble-columns-must-have-compatible-sizes (so perhaps the problem could be opened as new RNetCDF issue?)
Hi @ckluss , RNetCDF is behaving as intended, because it is returning descriptive information about the filters applied to the variables in your dataset (i.e. compression). This information is provided by recent NetCDF library versions, and I added support for this feature to RNetCDF about a year ago. Unfortunately, the change has broken ncmeta when used on netcdf4 datasets with compressed variables. This breakage was not picked up by the existing tests, but I think we are close to a solution now. Once @mdsumner is satisfied that the solution works properly, I hope he can release an update for ncmeta. That should fix your problem.
this was auto-closed by commit
Hi, I think I am having a similar problem on a windows 64 machine and using R-4.2.1 & ncdf4_1_19.zip from https://cran.r-project.org/web/packages/ncdf4/index.html.
If I run either of these code lines R studio just hangs nc <- ncdf4::nc_open("http://thredds.aodn.org.au/thredds/dodsC/IMOS/SRS/SST/ghrsst/L3S-1d/dn/2016/20160105092000-ABOM-L3S_GHRSST-SSTfnd-AVHRR_D-1d_dn.nc") nc <- tidync::tidync("http://thredds.aodn.org.au/thredds/dodsC/IMOS/SRS/SST/ghrsst/L3S-1d/dn/2016/20160105092000-ABOM-L3S_GHRSST-SSTfnd-AVHRR_D-1d_dn.nc)")
However, if I download the file and run, it works fine. But I have thousands of these to run through so downloading isn't practical. nc <- ncdf4::nc_open("20160105092000-ABOM-L3S_GHRSST-SSTfnd-AVHRR_D-1d_dn.nc")
With nc <- tidync::tidync("20160105092000-ABOM-L3S_GHRSST-SSTfnd-AVHRR_D-1d_dn.nc") I get the following error Error: Tibble columns must have compatible sizes.
filter_id
and filter_params
.chunksizes
.
i Only values of size one are recycled.Any tips for finding a way around this issue? Thanks
that is a different problem, caused in ncmeta - I'll have a look in coming days 🙏
Hi @clairedavies , tidync::tidync
works for me on Windows using the remote dataset in your example. You may need to install the latest versions of tidync
and ncmeta
.
Like you, I found that ncdf4::nc_open
hangs with the remote dataset. You could try using RNetCDF::open.nc
instead, which works for me on Windows (using the latest RNetCDF version). Please let me know if that works for you. If it does work, you could modify your code to replace ncdf4 commands by their equivalents from RNetCDF. For example, print.nc
displays the structure of the dataset, var.get.nc
reads variables, and att.get.nc
reads attributes.
Thank you both for the responses I updated the packages and tidync::tidync works but R Studio still hangs on tidync::hyper_tibble()
If I use RNetCDF all I seem to get is NAs nc <- RNetCDF::open.nc("http://thredds.aodn.org.au/thredds/dodsC/IMOS/SRS/SST/ghrsst/L3S-1d/dn/2016/20160105092000-ABOM-L3S_GHRSST-SSTfnd-AVHRR_D-1d_dn.nc") sst <- RNetCDF::var.get.nc(nc, variable = "sea_surface_temperature", start=c(20, 10, 1), count = c(50,50,1))
it's just a very sparse dataset so your start/count doesn't intersect the data at all - IMO you need a higher level tool than either RNetCDF or tidync for this source. With raster you get immediate helpful feedback and oversight of what's there. This file has poorly build lon/lat arrays - so some tools detect it as irregular where it is not. It' sentirely intended to be a regular grid in 70, 190, -70, 20 (xmin, xmax, ymin, ymax)
with 0.02 resolution.
For example
library(raster)
r <- raster::raster(f, varname = "sea_surface_temperature")
r
class : RasterLayer
dimensions : 4500, 6000, 2.7e+07 (nrow, ncol, ncell)
resolution : 0.02, 0.02 (x, y)
extent : 70, 190, -70, 20 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs
source : 20160105092000-ABOM-L3S_GHRSST-SSTfnd-AVHRR_D-1d_dn.nc
names : sea.surface.foundation.temperature
z-value : 2016-01-05 09:20:00
zvar : sea_surface_temperature
crop(r, extent(140, 160, -60, -40))
class : RasterLayer
dimensions : 1000, 1000, 1e+06 (nrow, ncol, ncell)
resolution : 0.02, 0.02 (x, y)
extent : 140, 160, -60, -40 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs
source : memory
names : sea.surface.foundation.temperature
values : 270.0998, 295.1398 (min, max)
time : 2016-01-05 09:20:00
Using raster and crop gives immediate and friendly tools for dealing with the data as a real map.
I'm sympathetic how confusing this can be, because even raster's replacement won't work with this source but for complex reasons that keep changing and are only a distraction imo. tidync is really for exploring the structure of a file, but this one is really simple just a regular grid with a few variables. hyper_tibble is a fast way of expanding subsets of the data to data frame, and this one is just too big to do in whole.
Thanks again = appreciate the help
Hi @clairedavies , as @mdsumner says, the NA
values seem to represent missing values (e.g. land) on the lon/lat grid. Note that the variables in this dataset have been 'packed' as a form of compression, so you probably want to retrieve the unpacked data using the argument unpack=TRUE
in var.get.nc
.
Yes, I noticed that, even more confusing. I think I'm sorted now with RNetCDF. Thanks for all the help
Hi all,
I still get the same issue with tidync::tidync and some of the CMIP6 climate model outputs, ncmeta 0.3.0, tidync 0.2.4 & RNetCDF 2.6-1, R 4.2.1. The packages seem to have been updated normally. The Tibble error reproduces on outputs from certain models, not all of them (i.e., CESM2-WACCM, but not CCMC-ESM2).
`
Hi @Tananaevs , to help us test the problem, could you please provide some links to datasets that are causing problems? Also, are you running R on Windows or something else?
Hi @mdsumner , before we invest too much time testing this problem further, I just want to check if you have published a new version of ncmeta since our previous attempt to fix the problem (#44).
@mjwoods yes, I am running the latest RStudio v.2022.07.1 build 554 on Win10. The following link allows downloading - upon registration - a wget script that starts further download of the problematic datasets: https://esgf-node.llnl.gov/esg-search/wget/?distrib=false&dataset_id=CMIP6.CMIP.NCAR.CESM2-WACCM.historical.r1i1p1f1.day.tas.gn.v20190227|esgf-data.ucar.edu
Hi @Tananaevs , the tibble issue has been fixed in the ncmeta
package source on github, but the new version has not yet been published on CRAN.
I tested the new ncmeta
version successfully on Windows, as shown below. It would be helpful for us if you could test the new version across all the CMIP6 datasets.
install.packages("devtools")
devtools::install_github("https://github.com/hypertidy/ncmeta.git")
setwd(tempdir())
options(timeout=max(300, getOption("timeout")))
download.file("http://esgf-data.ucar.edu/thredds/fileServer/esg_dataroot/CMIP6/CMIP/NCAR/CESM2-WACCM/historical/r1i1p1f1/day/tas/gn/v20190227/tas_day_CESM2-WACCM_historical_r1i1p1f1_gn_18500101-18591231.nc",
"test.nc", mode="wb")
tidync::tidync("test.nc")
file.remove("test.nc")
@mjwoods - thank you for publishing this solution! Do you have an estimate when it will be published to CRAN?
imminent 🙏
I believe this is fixed, tested on the cases reported and now on CRAN as ncmeta 0.3.5. Thanks!
Thanks @mdsumner !
Greetings. I am having similar issues with hyper_tibble().
I am using R 4.2.1, ncmeta 0.3.5, RNetCDF 2.6.1, ncdf4 1.19, tidync 0.3.0 in Windows 10
Here is the code below, which previously worked in a R 3.6.2 but now hangs up on the hyper_tibble command.
I sure would appreciate any advice! Thank you in advance.
library(RNetCDF) library(ncdf4) library(tidync) library(raster) library(tidyverse) library(ncmeta)
nc.grid <- tidync('http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_met_tmmx_1979_CurrentYear_CONUS.nc')
grid.yr <- nc.grid %>% activate("daily_maximum_temperature") %>%
hyper_filter(day = dplyr::between(index, 14784, 14898),
lat = dplyr::between(index, 263,319),
lon = dplyr::between(index, 387,462))
grid.yr <- grid.yr %>% hyper_tibble()
Issue is coming from here. https://github.com/hypertidy/ncmeta/blob/master/R/nc_var.R#L30
What's your preferred fix here @mdsumner ?
Created on 2022-07-25 by the reprex package (v2.0.1)