r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
563 stars 94 forks source link

Handling of large files: RAM usage very high #169

Closed adrfantini closed 1 year ago

adrfantini commented 5 years ago

I am opening a large-ish NetCDF file for testing purposes. It is a 25km E-OBS daily precipitation dataset; 201*464*25202 in size (332MB on disk, compressed).

It can be downloaded from here (direct link to the file here).

I am working on a machine with 64GB of RAM. I can load the dataset:

library(stars)
y <- read_ncdf('rr_ens_mean_0.25deg_reg_v19.0e.nc')
# read_stars takes FOREVER here, read_ncdf is MUCH faster

The file takes up 18803524336 bytes of RAM space (~17GB), which is consistent with its size (201*464*25202*8 = 18803514624). When trying to access its content, even by just show(y), stars requests a lot of memory, causing the system to hang and kill the R process. However, I can work around the problem by issuing gc() manually before the show(y): this works OK.

y
# stars object with 3 dimensions and 1 attribute
# attribute(s), summary of first 1e+05 cells:
# Killed
# ... restart R and re-read dataset
str(y)
# List of 1
#  $ rr:Object of class units:
# Killed
# ... restart R and re-read dataset
gc()
#              used    (Mb) gc trigger    (Mb)   max used    (Mb)
# Ncells     622702    33.3    1178674    63.0    1178674    63.0
# Vcells 2351529925 17940.8 8127149216 62005.3 8227628331 62771.9
y
# works now

So, a few points:

  1. why is stars requesting multiple GB of RAM just for show(y)? Is this only for the summary statistics on the array? The RAM usage in top goes from 18GB to 53GB just for the show(y)!
  2. I thought that issuing gc() manually was not usually required, but here it seems that R cannot handle the automatic cleaning before getting killed. Is there anything that can be done at stars level for this? Maybe calling gc() at the end of some functions?
  3. Unfortunately, any st_apply function will fail due to excessive RAM requirements, despite the dataset being only 18GB and the system having 64GB. There is a lot of overhead involved.

I understand this is problably an R limitation and not a stars limitation.

mdsumner commented 5 years ago

There's no proxy for read_ncdf, did you try that with read_stars?

mdsumner commented 5 years ago

(I will get tidync on CRAN next week and show how I would go about this)

adrfantini commented 5 years ago

Can't test now since the cluster's 64GB queue is busy, but will. I expect read_stars(..., proxy = TRUE) will work OK for accessing the file, but then any calculation (st_apply) will probably fail. Thinking about it, It would be awesome if st_apply was chunk-aware and could perform computations on a chunk at a time on proxy items, saving on RAM, but I understand this is a very specific usecase which only happens for netCDF (albeit I incur in this issue - performing chunk-aware computations - very often).

mdsumner commented 5 years ago

Made a start here, at least for reading via proxy for netcdf:

https://github.com/mdsumner/stars/commit/b19fe0dc84354974cae430204a50c54420a952e0

The crux for slicing is to pass the dimension expressions along to tidync::hyper_filter, and this new function defaults to proxy (but won't work in the other ways stars_proxy does yet). Use the proxy to find out the form of the source:

remotes::install_github("mdsumner/stars@mdsumner-dev")
f <- "pp_ens_mean_0.25deg_reg_v19.0e.nc"
stars::read_stars_tidync(f)
stars_proxy object with 1 attribute in file:
$names
[1] "pp_ens_mean_0.25deg_reg_v19.0e.nc"

dimension(s):
          from    to  offset delta refsys point values
longitude    1   464 -40.375  0.25     NA    NA   NULL
latitude     1   201  25.375  0.25     NA    NA   NULL
time         1 25202       0     1     NA    NA   NULL

Use the hyper_filter functionality to hone in, and set proxy to FALSE to read:

stars::read_stars_tidync(f, time = index < 5, proxy = FALSE) 
stars object with 3 dimensions and 1 attribute
attribute(s):
      pp         
 Min.   : 911.7  
 1st Qu.:1007.7  
 Median :1016.5  
 Mean   :1015.6  
 3rd Qu.:1024.7  
 Max.   :1046.7  
 NA's   :325714  
dimension(s):
          from  to  offset delta refsys point values
latitude     1 201  25.375  0.25     NA    NA   NULL
longitude    1 464 -40.375  0.25     NA    NA   NULL
time         1   4       0     1     NA    NA   NULL

I'm interested in how stars might use this kind of interface, and the problem is always about showing the user the available dimensions, their lengths/ranges and so forth.

(The stars/dims object creation is waay better now too so I'm also going to refactor a few things).

adrfantini commented 5 years ago

Manually processing different chunks is an option and it works well, but I was wondering if some automated procedure can be applied. In general however I do not understand why a 18GB object can manage to bring a 64GB machine to its knees even if just using show; that is a limitation that needs to be looked into. Maybe being lazy (proxy = TRUE) by default would be a good idea on the long term.

mdsumner commented 5 years ago

Do you know how to determine he internal tiling scheme of NetCDF?

dblodgett-usgs commented 5 years ago

The ncdump -sh output of a NetCDF-4 file will include "secret" attributes like:

        lon:_Storage = "chunked" ;
        lon:_ChunkSizes = 609, 659 ;
        lon:_DeflateLevel = 5 ;
        lon:_Shuffle = "true" ;
        lon:_Endianness = "little" ;

The _ChunkSizes attribute tells you the chunking scheme for that variable.

I don't think RNetCDF hits those hidden attributes though...

ncdf4 does although maybe not in all installs or versions of NetCDF-4? See the code below for an example.

download.file("https://github.com/r-spatial/stars/files/3112497/wrfout_hist_prcp_2002-09-01.nc4.zip",
              "wrfout_hist_prcp_2002-09-01.nc4.zip")
unzip("wrfout_hist_prcp_2002-09-01.nc4.zip")
nc <- RNetCDF::open.nc("wrfout_hist_prcp_2002-09-01.nc4")
finq <- RNetCDF::file.inq.nc(nc)
dinq <- RNetCDF::dim.inq.nc(nc, 1)
vinq <- RNetCDF::var.inq.nc(nc, "lat")

lat_att_names <- lapply(c(0:(vinq$natts-1)), function(x) RNetCDF::att.inq.nc(nc, "lat", x))
lat_atts <- lapply(c(0:(vinq$natts-1)), function(x) RNetCDF::att.get.nc(nc, "lat", x))

ncd <- ncdf4::nc_open("wrfout_hist_prcp_2002-09-01.nc4")
ncd$format
ncd$var$prcp$chunksizes
ncdf4::ncatt_get(ncd, "prcp", "_ChunkSizes")

wrfout_hist_prcp_2002-09-01.nc4.zip

`ncdump -sh wrfout_hist_prcp_2002-09-01.nc4` ``` netcdf wrfout_hist_prcp_2002-09-01 { dimensions: time = UNLIMITED ; // (1 currently) y = 609 ; x = 659 ; variables: double time(time) ; time:long_name = "Time" ; time:standard_name = "time" ; time:units = "days since 1900-01-01 00:00:00" ; time:calendar = "standard" ; time:_Storage = "chunked" ; time:_ChunkSizes = 512 ; time:_DeflateLevel = 5 ; time:_Shuffle = "true" ; time:_Endianness = "little" ; double y(y) ; y:grid_spacing = 4000.f ; y:standard_name = "projection_y_coordinate" ; y:long_name = "y-coordinate of projection" ; y:units = "m" ; y:_CoordinateAxisType = "GeoY" ; y:resolution = 4000. ; y:_Storage = "chunked" ; y:_ChunkSizes = 609 ; y:_DeflateLevel = 5 ; y:_Shuffle = "true" ; y:_Endianness = "little" ; double x(x) ; x:grid_spacing = 4000.f ; x:standard_name = "projection_x_coordinate" ; x:long_name = "x-coordinate of projection" ; x:units = "m" ; x:_CoordinateAxisType = "GeoX" ; x:resolution = 4000. ; x:_Storage = "chunked" ; x:_ChunkSizes = 659 ; x:_DeflateLevel = 5 ; x:_Shuffle = "true" ; x:_Endianness = "little" ; float lat(y, x) ; lat:esri_pe_string = "PROJCS[\"Sphere_Stereographic\",GEOGCS[\"GCS_Sphere\",DATUM[\"D_Sphere\",SPHEROID[\"Sphere\",6370000.0,0.0]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]],PROJECTION[\"Stereographic\"],PARAMETER[\"False_Easting\",0.0],PARAMETER[\"False_Northing\",0.0],PARAMETER[\"Central_Meridian\",-150.0],PARAMETER[\"Scale_Factor\",0.94939702315],PARAMETER[\"Latitude_Of_Origin\",90.0],UNIT[\"Meter\",1.0]];-30925000 -30925000 145629737.344236;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision" ; lat:grid_mapping = "ProjectionCoordinateSystem" ; lat:coordinates = "y x" ; lat:units = "degrees_north" ; lat:standard_name = "latitude" ; lat:long_name = "Latitude" ; lat:_FillValue = -1.e+33f ; lat:_Storage = "chunked" ; lat:_ChunkSizes = 609, 659 ; lat:_DeflateLevel = 5 ; lat:_Shuffle = "true" ; lat:_Endianness = "little" ; float lon(y, x) ; lon:esri_pe_string = "PROJCS[\"Sphere_Stereographic\",GEOGCS[\"GCS_Sphere\",DATUM[\"D_Sphere\",SPHEROID[\"Sphere\",6370000.0,0.0]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]],PROJECTION[\"Stereographic\"],PARAMETER[\"False_Easting\",0.0],PARAMETER[\"False_Northing\",0.0],PARAMETER[\"Central_Meridian\",-150.0],PARAMETER[\"Scale_Factor\",0.94939702315],PARAMETER[\"Latitude_Of_Origin\",90.0],UNIT[\"Meter\",1.0]];-30925000 -30925000 145629737.344236;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision" ; lon:grid_mapping = "ProjectionCoordinateSystem" ; lon:coordinates = "y x" ; lon:units = "degrees_east" ; lon:standard_name = "longitude" ; lon:long_name = "Longitude" ; lon:_FillValue = -1.e+33f ; lon:_Storage = "chunked" ; lon:_ChunkSizes = 609, 659 ; lon:_DeflateLevel = 5 ; lon:_Shuffle = "true" ; lon:_Endianness = "little" ; char ProjectionCoordinateSystem ; ProjectionCoordinateSystem:false_northing = 0. ; ProjectionCoordinateSystem:false_easting = 0. ; ProjectionCoordinateSystem:scale_factor_at_projection_origin = 0.94939702315 ; ProjectionCoordinateSystem:latitude_of_projection_origin = 90. ; ProjectionCoordinateSystem:longitude_of_projection_origin = 0. ; ProjectionCoordinateSystem:GeoTransform = "-1317999.87049 4000.0 0 -1574425.14759 0 -4000.0 " ; ProjectionCoordinateSystem:spatial_ref = "PROJCS[\"Sphere_Stereographic\",GEOGCS[\"GCS_Sphere\",DATUM[\"D_Sphere\",SPHEROID[\"Sphere\",6370000.0,0.0]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]],PROJECTION[\"Stereographic\"],PARAMETER[\"False_Easting\",0.0],PARAMETER[\"False_Northing\",0.0],PARAMETER[\"Central_Meridian\",-150.0],PARAMETER[\"Scale_Factor\",0.94939702315],PARAMETER[\"Latitude_Of_Origin\",90.0],UNIT[\"Meter\",1.0]];-30925000 -30925000 145629737.344236;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision" ; ProjectionCoordinateSystem:esri_pe_string = "PROJCS[\"Sphere_Stereographic\",GEOGCS[\"GCS_Sphere\",DATUM[\"D_Sphere\",SPHEROID[\"Sphere\",6370000.0,0.0]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]],PROJECTION[\"Stereographic\"],PARAMETER[\"False_Easting\",0.0],PARAMETER[\"False_Northing\",0.0],PARAMETER[\"Central_Meridian\",-150.0],PARAMETER[\"Scale_Factor\",0.94939702315],PARAMETER[\"Latitude_Of_Origin\",90.0],UNIT[\"Meter\",1.0]];-30925000 -30925000 145629737.344236;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision" ; ProjectionCoordinateSystem:grid_mapping_name = "polar_stereographic" ; ProjectionCoordinateSystem:transform_name = "polar_stereographic" ; ProjectionCoordinateSystem:_CoordinateTransformType = "Projection" ; ProjectionCoordinateSystem:_CoordinateAxes = "y x" ; float prcp(time, y, x) ; prcp:_FillValue = 1.e+20f ; prcp:esri_pe_string = "PROJCS[\"Sphere_Stereographic\",GEOGCS[\"GCS_Sphere\",DATUM[\"D_Sphere\",SPHEROID[\"Sphere\",6370000.0,0.0]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]],PROJECTION[\"Stereographic\"],PARAMETER[\"False_Easting\",0.0],PARAMETER[\"False_Northing\",0.0],PARAMETER[\"Central_Meridian\",-150.0],PARAMETER[\"Scale_Factor\",0.94939702315],PARAMETER[\"Latitude_Of_Origin\",90.0],UNIT[\"Meter\",1.0]];-30925000 -30925000 145629737.344236;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision" ; prcp:grid_mapping = "ProjectionCoordinateSystem" ; prcp:coordinates = "time y x" ; prcp:units = "kg m-2 day-1" ; prcp:standard_name = "precipitation_amount" ; prcp:long_name = "Daily Precipitation" ; prcp:_Storage = "chunked" ; prcp:_ChunkSizes = 1, 609, 659 ; prcp:_DeflateLevel = 5 ; prcp:_Shuffle = "true" ; prcp:_Endianness = "little" ; // global attributes: :institution = "National Center for Atmospheric Research" ; :created_by = "Andy Monaghan - monaghan@ucar.edu" ; :notes = "Created with NCL script: wrfout_to_cf.ncl v2.0" ; :source = "surface_d01_2002-09-01_RAINNCtot.nc" ; :creation_date = "Thu Jul 27 17:01:02 MDT 2017" ; :NCL_Version = "6.4.0" ; :system = "Linux geyser12 2.6.32-358.el6.x86_64 #1 SMP Wed Nov 2 11:00:18 MDT 2016 x86_64 x86_64 x86_64 GNU/Linux" ; :Conventions = "CF 1.6" ; :netcdf_source = "wrfout_hist_prcp_2002-09-01.nc" ; :title = "wrfout_hist_prcp_2002-09-01.nc" ; :_NCProperties = "version=1,netcdflibversion=4.4.1.1,hdf5libversion=1.8.14" ; :_SuperblockVersion = 0 ; :_IsNetcdf4 = 1 ; :_Format = "netCDF-4 classic model" ; } ```
mdsumner commented 5 years ago

Excellent, thanks!

adrfantini commented 5 years ago

I usually do this with ncdf4 and never encountered problems.

edzer commented 5 years ago

I'm seeing

> ncd <- ncdf4::nc_open("wrfout_hist_prcp_2002-09-01.nc4")
> ncd$format
[1] "NC_FORMAT_NETCDF4_CLASSIC"
> ncd$var$prcp$chunksizes
[1] NA
> ncdf4::ncatt_get(ncd, "prcp", "_ChunkSizes")
$hasatt
[1] FALSE

$value
[1] 0

was that expected?

> sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.3 RNetCDF_1.9-1 ncdf4_1.16
dblodgett-usgs commented 5 years ago

Yeah... the _ChunkSizes attribute is a "special" attribute that isn't accessible that way I guess. See the -s flag here for more. I would expect it to be in the ncd$var$prcp$chunksizes element but it's NA for some reason on my computer.

adrfantini commented 5 years ago

Weird. I suspect that _ChunkSizes is NA in ncdf4 since the chunk size is equal to the variable size - that is, there is only one chunk in the file.

dblodgett-usgs commented 5 years ago

Ahh. that makes sense. But checking another file that is chunked for real -- too large to share the binary here -- I see the same thing. Documentation of the ncvar4 class on page 16 here doesn't include any description of the chunksizes element of the class. Ahh. Here it is.

Looks like the format has to be NC_FORMAT_NETCDF4 not NC_FORMAT_NETCDF4_CLASSIC but the CLASSIC data model CAN use chunking!

I just did: nccopy -4 wrfout_hist_prcp_2002-09-01.nc4 wrfout_hist_prcp_2002-09-01.nc

and got:

> ncd <- ncdf4::nc_open("wrfout_hist_prcp_2002-09-01.nc")
> ncd$format
[1] "NC_FORMAT_NETCDF4"
> ncd$var$prcp$chunksizes
[1] 659 609   1
> ncdf4::ncatt_get(ncd, "prcp", "_ChunkSizes")
$hasatt
[1] FALSE

$value
[1] 0
mdsumner commented 5 years ago

Another point on this is that the variable 'pp' is internally a short integer, but gets expanded into double by the "unpacking" (application of scale/offset). In tidync that is controlled by the "raw_datavals" argument, i.e.

f <- "pp_ens_mean_0.25deg_reg_v19.0e.nc"
a <- tidync(f) %>% hyper_array(raw_datavals = TRUE)

which halves the size compared the unpacked version. Is that of interest? ( I'm working on making read_ncdf always use tidync, and default to proxy = TRUE so I'll be changing a few things and hopefully expose this control over the raw read).

edzer commented 5 years ago

The decision to load everything in 8-byte doubles, rather than 4-byte integers when possible, was intentional. My reasoning was that a factor 2 is not worth the trouble (of e.g. having to postpone the scale/offset). When chunking is done anyway, one can get down to a factor-anything at that point.

adrfantini commented 5 years ago

Another point on this is that the variable 'pp' is internally a short integer, but gets expanded into double by the "unpacking" (application of scale/offset). In tidync that is controlled by the "raw_datavals" argument, i.e.

f <- "pp_ens_mean_0.25deg_reg_v19.0e.nc"
a <- tidync(f) %>% hyper_array(raw_datavals = TRUE)

which halves the size compared the unpacked version. Is that of interest?

Yeah I sometimes use that same flag in ncdf::ncvar_get, the way I see it is that since it's just a flag that sits there it might as well stay

( I'm working on making read_ncdf always use tidync, and default to proxy = TRUE so I'll be changing a few things and hopefully expose this control over the raw read).

That is great. What I do not understand is if there is any plan to make processing-by-chunk the default, once proxy = TRUE. I do not even know if that's possible.

@mdsumner , which branch should I test right now for read_ncdf(proxy = TRUE)? Can I go with the netcdf-dev that you just defaulted to?

edzer commented 5 years ago

In an ideal world, I think, one would entirely get away from proxy=FALSE unless the object is created in-memory, but even then you could write it to disk first and continue with proxy=TRUE. Having said that, that would ask for ultimate flexibility in chunking approaches, and clever handling of cases that cannot be chunked at all (multidimensional segmentation? watersheds?) I'm not sure we can, or anyone could, ever get there, and still hide the chunking details from the user. See also the limitations a system like GEE has.

adrfantini commented 5 years ago

In an ideal world, I think, one would entirely get away from proxy=FALSE

You mean TRUE?

unless the object is created in-memory, but even then you could write it to disk first and continue with proxy=TRUE. Having said that, that would ask for ultimate flexibility in chunking approaches, and clever handling of cases that cannot be chunked at all (multidimensional segmentation? watersheds?) I'm not sure we can, or anyone could, ever get there, and still hide the chunking details from the user. See also the limitations a system like GEE has.

There will always be corner cases. In my personal experience handling chunks gracefully and intelligently would enormously help the handling of large datasets - but this of course might be biased towards what I do, which is climate analysis.

Some software assume a given chunking scheme; CDO for example silently assumes that datasets have a chunking along time of lenght 1; this makes it extremely fast (much faster than anything else) for all those files that comply (>90% in climate), even for some complex functions. On the other hand:

While the limitations are great, the plusses are such that CDO is by far the most dominant tool for data analysis in climate science.

mdsumner commented 5 years ago

Yes netcdf-dev, there is a new read_stars_tidync function. Units for dins and vars, and consolidation of the two funs is coming, was a good day :)

mdsumner commented 5 years ago

I think chunking is easy to do once the base tools are right, but auto-determining the best strategy is very hard (it's what DB systems do, by locking down the context very hard). I have some very big climate output use cases here and hoping some colleagues chip in soon too. Multiple sources is more important to abstract over first, IMO

edzer commented 5 years ago

What do you mean by multiple sources?

mdsumner commented 5 years ago

More than one file, Netcdf has this concept of unlimited dimension, which just spreads a long series over multiple files. With tidync and raadtools I've learnt to not create a dataset, rather provide a curated series of files of known structure. There's a lot of lazy potential here!

dblodgett-usgs commented 5 years ago

@mdsumner -- note that an "unlimited" dimension actually has more to do with how the file is structured than spreading across multiple files. An unlimited dimension has the data and time coordinates interleaved -- so if you want to keep writing records with unique time stamps, you can. It just so happens that if you want to tack files together or split them apart it's super easy if the file is already unlimited along time.

The problem with an unlimited dimension is that, to scan a 1d variable along the unlimited dimension (e.g. time) you have to move the disk head over all the interleaved data! Faster with ssd but still lots of atomic reads that would (in an unlimited file) be a contiguous block of bits.

So, for virtual file aggregation as would be the case for a proxy=TRUE case, a collection of unlimited dimensions with time as the outer (most rapidly varying) dimension would be preferable as the initial scan to create the collection would be much faster.

@adrfantini -- the chunking of time being most common, IMHO, is a reflection of time being the most common default outer dimension (required in COARDS?). In NetCDF3, this just means each timestep worth of X/Y data is contiguous on disk so if you were to use that chunking pattern and not compress anything, it would function essentially the same.

adrfantini commented 5 years ago

@adrfantini -- the chunking of time being most common, IMHO, is a reflection of time being the most common default outer dimension (required in COARDS?). In NetCDF3, this just means each timestep worth of X/Y data is contiguous on disk so if you were to use that chunking pattern and not compress anything, it would function essentially the same.

Time is usually unlimited and its chunk is of lenght one for a very simple reason: climate models write our the results every timestep, one timestep at a time. So they just increment the time dimension by 1 and to avoid rewrites this implies that the time chunk must be 1. Not all climate models are subsequently postprocessed and rechunked, and even if they are this chunking pattern is usually retained, since it is somewhat of an untold standard.

dblodgett-usgs commented 5 years ago

Too true. Time being unlimited is also an artifact of nco requiring unlimited dimensions for glueing multiple files together or splitting them apart.

adrfantini commented 5 years ago

Yes netcdf-dev, there is a new read_stars_tidync function. Units for dins and vars, and consolidation of the two funs is coming, was a good day :)

@mdsumner mdsumner/stars@netcdf-dev? Because I did that and I do not have any read_stars_tidync function.

mdsumner commented 5 years ago

oh sorry, it's only internal - I didn't think of that. It's still too early unless you like suffering chasing development - devtools::load_all() is the fastest way to load everything and avoid tidync:::read_stars_tidync syntax.

mdsumner commented 5 years ago

And you also can't pull a lazy_stars from this function, you can only run it again with proxy=FALSE.

adrfantini commented 5 years ago

oh sorry, it's only internal - I didn't think of that. It's still too early unless you like ~suffering~ chasing development - devtools::load_all() is the fastest way to load everything and avoid tidync:::read_stars_tidync syntax.

I'm having issues with this:

> devtools:install_github('mdsumner/stars@netcdf-dev')
....
> devtools::load_all('~/R/x86_64-pc-linux-gnu-library/3.5/stars/R/stars')
Loading stars
Skipping missing files: init.R, stars.R, read.R, sf.R, dimensions.R, values.R, plot.R, tidyverse.R, transform.R, ops.R, write.R, raster.R, sp.R, spacetime.R, ncdf.R, proxy.R, factors.R, rasterize.R, subset.R, warp.R, aggregate.R, xts.R, intervals.R, geom.R
Warning messages:
1: S3 methods ‘$<-.stars’, ‘[.dimension’, ‘[.dimensions’, ‘[.stars’, ‘[.stars_proxy’, ‘[<-.stars’, ‘dimnames<-.stars’, ‘st_crs<-.stars’, ‘Math.stars’, ‘Math.stars_proxy’, ‘Ops.stars’, ‘Ops.stars_proxy’, ‘adrop.stars’, ‘adrop.stars_proxy’, ‘aggregate.stars’, ‘aggregate.stars_proxy’, ‘aperm.stars’, ‘aperm.stars_proxy’, ‘as.data.frame.stars’, ‘c.stars’, ‘c.stars_proxy’, ‘cut.array’, ‘cut.matrix’, ‘cut.stars’, ‘dim.dimensions’, ‘dim.stars’, ‘dim.stars_proxy’, ‘dimnames.stars’, ‘drop_units.stars’, ‘image.stars’, ‘is.na.stars’, ‘merge.stars’, ‘merge.stars_proxy’, ‘plot.stars’, ‘plot.stars_proxy’, ‘predict.stars’, ‘predict.stars_proxy’, ‘print.dimensions’, ‘print.stars’, ‘print.stars_proxy’, ‘print.stars_raster’, ‘seq.dimension’, ‘split.stars’, ‘split.stars_proxy’, ‘st_apply.stars’, ‘st_apply.stars_proxy’, ‘st_ar [... truncated]
2: In setup_ns_exports(path, export_all, export_imports) :
  Objects listed as exports, but not present in namespace: geom_stars, make_intervals, read_ncdf, read_stars, st_apply, st_as_stars, st_contour, st_dimensions, st_get_dimension_values, st_rasterize, st_redimension, st_set_dimensions, st_sfc2xy, st_warp, st_xy2sfc, write_stars

Would you please be able to provide a MRE?

EDIT: Also, I cannot find tidync:::read_stars_tidync after installing mdsumner/tidync.

mdsumner commented 5 years ago

I wouldn't expect load_all to work like that, I was expecting to clone the repo and go from there

also I made a mistake! it's stars:::read_stars_tidync