Closed everydayduffy closed 2 years ago
There shouldn't be any file size limitations - I don't know. Can you try opening with RNetCDF? It's been updated recently on CRAN, and that might be enough. It's now at version 2.1-1
nc <- RNetCDF::open.nc("C:/Dropbox/Sandbox/era5_globe_april_1990_1990.nc")
RNetCDF::print.nc(nc)
If that also doesn't work, can you report your version of RNetCDF, and try updating it?
If you can tell me where to download the file I'll explore myself. Thanks ;)
Oh sorry, I see your session info - you are at latest RNetCDF.
If there's any way I can get access to your file I'd pursue it.
Thanks for offering to look at it. Have sent you a Dropbox download link.
Cool thanks for sharing the file, I don't find any problems (either CRAN or github tidync).
f <- "/mnt/mdsumner/tidync_98/era5_globe_april_1990_1990.nc"
x <- tidync::tidync(f)
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 tidyr_1.0.0 zeallot_0.1.0 packrat_0.5.0 crayon_1.3.4 dplyr_0.8.3
[7] assertthat_0.2.1 R6_2.4.0 lifecycle_0.1.0 backports_1.1.5 magrittr_1.5 ncdf4_1.16.1
[13] pillar_1.4.2 rlang_0.4.0 ncmeta_0.1.0 rstudioapi_0.10 vctrs_0.2.0 forcats_0.4.0
[19] tools_3.6.1 glue_1.3.1 purrr_0.3.3 RNetCDF_2.1-1 parallel_3.6.1 compiler_3.6.1
[25] pkgconfig_2.0.3 tidyselect_0.2.5 tidync_0.2.1.9002 tibble_2.1.3
But, as a long shot - maybe the Dropbox client is causing problems for you? Maybe try moving the file out of Dropbox itselft and read then. Other than that I don't have any suggestions, but I will try on Windows when I can.
Another possibility, try with the Github version of ncmeta:
devtools::install_github("hypertidy/ncmeta")
The file connection is wrapped in a safely() call, and I realize that might be masking useful error messages so I'll try to unpack that bit in future.
Thanks for taking a look. Moving outside of Dropbox didn't work, and I have just tried with the Github version of ncmeta
, but still no joy.
Thanks for the follow up, I'll try on Windows - I actually have to push tidync out to CRAN because of failing checks, so I need to fix this quickly if possible
Ok - well I'm happy to help if I can. Would you like me to send you a file obtained from the same source (but with smaller dimensions) that works fine for me?
Confirmed, it's a problem in RNetCDF on Windows (x64):
> f
[1] "C:/mds/era5_globe_april_1990_1990.nc"
> library(RNetCDF)
> nc <- open.nc(f)
Error in open.nc(f) : NetCDF: Numeric conversion not representable
>
but, tidync is wrapping it in a way that means we don't see the real message.
What I'm unclear on is if a new from-source build of RNetCDF would fix it. The rwinlib was recently updated, but I'm not sure if the CRAN version was also updated.
Also, I don't know what exactly causes the problem. @mjwoods have you seen this?
Hi @mdsumner , I'll have access to a Windows box later this week, so I could test the file myself then. Do you mind sending me a Dropbox link, @everydayduffy ?
Also, have either of you tried opening the file on a Linux machine? That would help me to determine if the problem is caused by RNetCDF itself or the Windows libraries it uses.
Definitely works fine on Linux, I can follow up with system details 👍
Dropbox link sent to @mjwoods. Thank you both for looking into this.
Hi @mjwoods ,
Did you get my .nc file ok? Wondered if you had time to check on a Windows machine?
Hi @everydayduffy , I did manage to download your file, thanks. I tested it on Windows with several different builds of netcdf. Some worked, but not the versions we need for R packages. I think that means the problem lies in the netcdf library, and not in RNetCDF itself. I'll continue to investigate until I run out of ideas. It's a spare-time project for me, so it may take a while to sort out.
Hi @mjwoods. Thanks for the update and for spending some time investigating. Much appreciated!
Hi @mjwoods. I had an idea today which worked... I read the troublesome .nc file into python with the xarray
package, and wrote it out again. The new file can now be read by tidync
in R
. It's a workaround which allows me to work with the data on Windows, but doesn't solve the RNetCDF problem! Thought I would let you know in case it's of interest.
Hi @everydayduffy , which version of python are you using? Is it on Windows? If so, maybe we can learn something from the way they build their netcdf library.
Hi @mjwoods, am using the following: Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
On windows 10 64bit and installed with Anaconda.
I have been converting the .nc files in R with the reticulate
package, using the following code:
nc_file <- "old_nc.nc"
nc_file_c <- "new_nc.nc"
reticulate::use_condaenv("C:/anaconda/envs/R_env") # conda environment with numpy and xarray installed.
xr <- reticulate::import("xarray")
DS = xr$open_dataset(nc_file)
DS$to_netcdf(nc_file_c)
I looked at the build recipes for netcdf4 in anaconda, and one thing I notice is that they use the Microsoft compilers. We have to use the build system used by R, which is based on gcc. This may have something to do with our problems. I don't think gcc itself is at fault, but maybe the netcdf source code relies on special features of the Microsoft compilers on Windows. I'm just guessing for now.
I tested the latest Windows builds of RNetCDF (2.1-1) and ncdf4 (1.17), and both have the same problem with the file from @everydayduffy .
> RNetCDF::open.nc("era5_globe_april_1990_1990.nc")
Error in RNetCDF::open.nc("era5_globe_april_1990_1990.nc") :
NetCDF: Numeric conversion not representable
> ncdf4::nc_open("era5_globe_april_1990_1990.nc")
Error in R_nc4_open: NetCDF: Numeric conversion not representable
Error in ncdf4::nc_open("era5_globe_april_1990_1990.nc") :
Error in nc_open trying to open file era5_globe_april_1990_1990.nc
ncdf4-1.17 uses prebuilt libraries from https://github.com/rwinlib/netcdf … and so does RNetCDF-2.1-1. Interestingly, the previous ncdf4-1.16 used older netcdf libraries from http://win-builder.r-project.org, so I tried building RNetCDF the same way. The file "era5_globe_april_1990_1990.nc" was opened successfully!
Unfortunately, the older libraries from winbuilder do not support OpenDAP, so switching back to them would reduce the functionality available for other users.
@everydayduffy - until we can solve this problem properly, I could build a special version of RNetCDF for you using the winbuilder libraries. Please let me know if you would like this, because winbuilder only keeps builds for 3 days.
Hi @mjwoods,
Thanks again for further investigating this. Building a bespoke version would be very much appreciated. Myself and a few colleagues would be very grateful. Would installation of this version be relatively straightforward?
Hi @everydayduffy , the new package is ready at https://win-builder.r-project.org/Q9lZgg3e7K0z/ . The zip file contains the binary files for Windows. The package will be removed automatically within 72 hours, so please download ASAP.
Installation should be possible from within R (on Windows) using the command install.packages("RNetCDF_2.2-1.zip")
, adding your download directory if necessary.
Good luck!
Great - thank you so much @mjwoods. I have it installed, and now RNetCDF::open.nc
isn't giving the Numeric conversion not representable
error (which is a good sign). However, functions from tidync
are still throwing this error. Is tidync somehow using the "broken" version of RNetCDF
. Can I get it to call upon the new one you have made me?
That's strange - tidync worked for me. I am using R-3.6.2 (64 bit) on Windows 10 with the latest tidync (and dependencies), plus RNetCDF_2.2-1 from win-builder. I tried opening the 8.4GB file you sent me earlier on Dropbox:
tidync::tidync("D:/milto/Documents/era5_globe_april_1990_1990.nc")
Data Source (1): era5_globe_april_1990_1990.nc ...
Grids (4) <dimension family> : <associated variables>
[1] D0,D1,D2 : t2m, d2m, sp, u10, v10, tp, tcc, msnlwrf, msdwlwrf, fdir, ssrd **ACTIVE GRID** ( 747532800 values per variable)
[2] D0 : longitude
[3] D1 : latitude
[4] D2 : time
Dimensions 3 (all active):
dim name length min max start count dmin dmax unlim coord_dim
<chr> <chr> <dbl> <dbl> <dbl> <int> <int> <dbl> <dbl> <lgl> <lgl>
1 D0 longitu~ 1440 -180 180. 1 1440 -180 1.80e2 FALSE TRUE
2 D1 latitude 721 -90 90 1 721 -90 9.00e1 FALSE TRUE
3 D2 time 720 791088 791807 1 720 791088 7.92e5 FALSE TRUE
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 tidyr_1.0.0 fansi_0.4.0 utf8_1.1.4
[5] zeallot_0.1.0 crayon_1.3.4 dplyr_0.8.3 assertthat_0.2.1
[9] R6_2.4.1 lifecycle_0.1.0 backports_1.1.5 magrittr_1.5
[13] ncdf4_1.17 pillar_1.4.3 cli_2.0.0 rlang_0.4.2
[17] ncmeta_0.2.0 vctrs_0.2.1 tools_3.6.2 forcats_0.4.0
[21] glue_1.3.1 purrr_0.3.3 RNetCDF_2.2-1 compiler_3.6.2
[25] pkgconfig_2.0.3 tidyselect_0.2.5 tidync_0.2.3 tibble_2.1.3
I am not familiar with tidync, so please let me know if there are any other tests I should try.
Perhaps it would be a good idea to use update.packages
to ensure that the latest packages are being used. You may need to reinstall RNetCDF_2.2-1.zip as well.
Ah ok, now i've dissected the workflow a bit more, I can see the problem is with hyper_tibble
. tidync::tidync()
on its own works fine now... (the file i'm using is of similar size and contents to the one I provided you).
library(tidync)
f <- "C:/PROCESSING/raw_data/era5_surface_global_2008_02.nc"
# works
tidync(f)
# breaks on hyper_tibble
b <- tidync(f) %>%
tidync::hyper_filter(longitude = longitude == 5,
latitude = latitude == 5) %>%
hyper_tibble()
Here's my session info FYI:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidync_0.2.3 RNetCDF_2.2-1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 rstudioapi_0.10 magrittr_1.5 tidyselect_0.2.5 R6_2.4.1 rlang_0.4.2 fansi_0.4.0 dplyr_0.8.3 tools_3.6.1
[10] utf8_1.1.4 ncmeta_0.2.0 cli_2.0.0 assertthat_0.2.1 tibble_2.1.3 lifecycle_0.1.0 crayon_1.3.4 purrr_0.3.3 tidyr_1.0.0
[19] vctrs_0.2.1 ncdf4_1.17 zeallot_0.1.0 glue_1.3.1 compiler_3.6.1 pillar_1.4.3 forcats_0.4.0 backports_1.1.5 pkgconfig_2.0.3
What's the error?
Error in R_nc4_open: NetCDF: Numeric conversion not representable
Error in ncdf4::nc_open(x$source$source[1]) :
Error in nc_open trying to open file C:/PROCESSING/raw_data/era5_surface_global_2008_02.nc
The latest version of ncdf4 uses the same netcdf libraries as RNetCDF on Windows. Unfortunately I can’t find any older ncdf4 versions on CRAN that are compatible with R-3.6. As a workaround, I have rebuilt ncdf4 with the old CRAN netcdf library. The package is available at https://win-builder.r-project.org/E4P6xR8nZFu6/ . This may solve your problem, but I haven’t been able to test it yet.
Hi @everydayduffy , I just tested the custom ncdf4 build with hyper_tibble
, and it seems to work:
> library(tidync)
> f <- "era5_globe_april_1990_1990.nc"
> b <- tidync(f) %>%
+ tidync::hyper_filter(longitude = longitude == 5,
+ latitude = latitude == 5) %>%
+ hyper_tibble()
> b
# A tibble: 720 x 14
t2m d2m sp u10 v10 tp tcc msnlwrf msdwlwrf fdir ssrd
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 301. 297. 1.01e5 2.49 3.53 0. 1 -152. 276. 1.92e6 2.14e6
2 301. 297. 1.01e5 2.75 4.05 1.03e-6 1 -152. 276. 1.92e6 2.14e6
3 301. 297. 1.01e5 2.95 4.60 2.07e-6 1 -152. 276. 1.92e6 2.14e6
4 301. 297. 1.01e5 3.25 4.57 8.26e-6 1 -152. 276. 1.92e6 2.14e6
5 301. 297. 1.01e5 3.34 3.84 2.48e-5 1 -152. 276. 1.92e6 2.14e6
6 301. 297. 1.01e5 3.35 3.60 2.58e-5 1 -152. 276. 1.92e6 2.14e6
7 301. 297. 1.01e5 3.62 4.18 1.96e-5 0.995 -152. 276. 1.92e6 2.14e6
8 301. 297. 1.01e5 3.98 4.68 1.03e-5 0.917 -152. 276. 1.92e6 2.14e6
9 301. 297. 1.01e5 3.63 4.63 1.03e-5 0.835 -152. 276. 1.92e6 2.14e6
10 302. 298. 1.01e5 2.80 4.36 8.26e-6 0.882 -152. 276. 1.92e6 2.14e6
# ... with 710 more rows, and 3 more variables: longitude <dbl>,
# latitude <dbl>, time <dbl>
The ncdf4_1.17-2.zip package will be automatically removed within a few days, but I can have it rebuilt if needed.
Thanks @mjwoods. I have downloaded the modified ncdf4 pacakge. Have got a lot of scripts running at the moment, so I'll have to wait a while before I can test, but pleased to see that hyper_tibble()
was working for you. I really appreciate the time you've spent helping me - thank you. I don't want to take much more of your time, so looking forward, what do you think the longer term solution (if any) to this issue is?
Hi @everydayduffy , I'm pleased to see RNetCDF being used for serious research, and it's the least I can do to make sure it works!
In the longer term, the new msys2-based toolchain being used for R should provide up-to-date versions of the netcdf library, which may eventually fix the bugs that we are seeing. In fact, I recently built RNetCDF and ncdf4 with the new toolchain and the included netcdf library, and the above tests with hyper_tibble
actually worked! The problem is that opendap did not work at all (even using the version of the netcdf library with opendap enabled). I'll try contacting the netcdf developers to ask for advice.
@mjwoods Hi, I have checked this file https://win-builder.r-project.org/E4P6xR8nZFu6/ but it is not available anymore. I have the same issue with my .nc file. Can you please reupload it ?
Hi @am2222 , are you using the latest R and RNetCDF for Windows? I was expecting these versions to fix the problem. If your file is not too large, please attach it to this issue so that I can have a closer look.
Hi again, I still have this issue with ERA5 data where the file size is >4 GB. I suppose I will have to just download the data again into smaller files ...
open.nc(flist[h]) Error in open.nc(flist[h]) : NetCDF: Numeric conversion not representable
sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] lubridate_1.7.9 RNetCDF_2.4-2
loaded via a namespace (and not attached): [1] compiler_4.0.3 generics_0.0.2 tools_4.0.3 Rcpp_1.0.5
@RichardBean - see below: one workaround I found... can be done with reticulate
if you want to keep it all in R.
Hi @mjwoods. I had an idea today which worked... I read the troublesome .nc file into python with the
xarray
package, and wrote it out again. The new file can now be read bytidync
inR
. It's a workaround which allows me to work with the data on Windows, but doesn't solve the RNetCDF problem! Thought I would let you know in case it's of interest.
I am having this same problem. I have been using the same script to open the NOAA nClimGrid nc files (https://www.ncei.noaa.gov/thredds/catalog/data-in-development/nclimgrid/catalog.html) for months using tidync().
As of this morning I get the following error (yet the file opens just fine with ncdf4). I have updated R and all packages and have tried using different netcdf files of different datasets, all with same result. Did something change in the package or its dependencies?
Error: Tibble columns must have compatible sizes.
• Size 2: Columns filter_id
and filter_params
.
• Size 3: Column chunksizes
.
ℹ Only values of size one are recycled.
The problem appeared to be with ncmeta. Installing the dev version of the package resolves this issue. https://github.com/hypertidy/ncmeta/issues/42
I'm prepping arelease for ncmeta which should fix, sorry for the confusion and delay.
@acrunyon can I please ask for an actual url to a netcdf that you use with 'tidync()' - that catalog pages has several possible links ending in '.nc' and I can never remember which one is supposed to work, ty
that's the catalog page I mentioned, I can't reproduce with that - please reprex when reporting on issues
ncdump -h https://www.ncei.noaa.gov/thredds/catalog/data-in-development/nclimgrid/catalog.html?dataset=data-in-development/nclimgrid/nclimgrid_prcp.nc
NetCDF: Malformed or unexpected Constraint
Sorry wrong link. Try this one: https://www.ncei.noaa.gov/thredds/fileServer/data-in-development/nclimgrid/nclimgrid_prcp.nc
that still doesn't work for me, what does work is the link from the OpenDAP page i.e.
ncdump -h https://www.ncei.noaa.gov/thredds/dodsC/data-in-development/nclimgrid/nclimgrid_prcp.nc
that's the bit in "Data URL:" here https://www.ncei.noaa.gov/thredds/dodsC/data-in-development/nclimgrid/nclimgrid_prcp.nc.html found by following OpenDAP (the top link in the catalog page).
there's a few subtopics here, if anything remains please open a new issue.
ncmeta is now updated on CRAN at 0.3.5, fixing the problem with tibble compatible sizes.
Session Info
```r - Session info ------------------------------------------------------------------------------------------------------------------------- setting value version R version 3.5.2 (2018-12-20) os Windows 10 x64 system x86_64, mingw32 ui RStudio language (EN) collate English_United Kingdom.1252 ctype English_United Kingdom.1252 tz Europe/London date 2019-10-21 - Packages ----------------------------------------------------------------------------------------------------------------------------- package * version date lib source assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3) backports 1.1.5 2019-10-02 [1] CRAN (R 3.5.3) callr 3.3.2 2019-09-22 [1] CRAN (R 3.5.3) cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.2) desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.2) devtools 2.2.1 2019-09-24 [1] CRAN (R 3.5.3) digest 0.6.21 2019-09-20 [1] CRAN (R 3.5.3) dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.5.3) ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.5.3) forcats 0.4.0 2019-02-17 [1] CRAN (R 3.5.2) fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.3) glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3) magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.2) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.2) ncdf4 1.16.1 2019-03-11 [1] CRAN (R 3.5.3) ncmeta 0.1.0 2019-08-28 [1] CRAN (R 3.5.3) pillar 1.4.2 2019-06-29 [1] CRAN (R 3.5.3) pkgbuild 1.0.5 2019-08-26 [1] CRAN (R 3.5.3) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.5.3) pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.2) prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.2) processx 3.4.1 2019-07-18 [1] CRAN (R 3.5.2) ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2) purrr 0.3.2 2019-03-15 [1] CRAN (R 3.5.3) R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2) Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.5.3) remotes 2.1.0 2019-06-24 [1] CRAN (R 3.5.3) rlang 0.4.0 2019-06-25 [1] CRAN (R 3.5.3) RNetCDF 2.1-1 2019-10-20 [1] CRAN (R 3.5.3) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.2) rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.3) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.2) testthat 2.2.1 2019-07-25 [1] CRAN (R 3.5.3) tibble 2.1.3 2019-06-06 [1] CRAN (R 3.5.3) tidync * 0.2.1 2019-05-23 [1] CRAN (R 3.5.3) tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.2) usethis 1.5.1 2019-07-04 [1] CRAN (R 3.5.3) withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.2) [1] C:/R/R352/library ```Hello,
It's hard to produce a reprex with this issue - as I think it might be a filesize limit... I have a ~16 GB .nc file, that opens fine with
ncdf4::nc_open
:But when I use
tidync::tidync()
, I get the following error:I have much smaller .nc files from the same source (it's ERA5 climate data from the Climate Data Store that open successfully with both packages. Is there something about the rather large size of the .nc file that tidync isn't happy with?