Open ryan-williams opened 6 years ago
In #285 there was mention of n5, which has a java and rust implementations, maybe more. n5
is similar in concept to zarr, apparently.
N5 is basically this. The specs differ a bit in minor ways. Convergence would be good to have. Some relevant discussion in issue ( https://github.com/zarr-developers/zarr/issues/231 ).
JVM implementation of Zarr would be very cool, particularly if it had the same flexibility as the Python implementation to plug in different storage back-ends including cloud object stores.
Thanks for all the pointers! I've looked a bit at n5 and z5; a couple questions:
gcsfs
/s3fs
) and Java (via NIO adapters) seems to work wellh5py
I'm not aware of a way to read HDF5 from cloud stores in python
gcsfs's FUSE module does allow this, and there are other FUSE solutions out there too. The implementation is not at all performance compared to zarr. In addition, https://github.com/ContinuumIO/intake-xarray will shortly allow streaming of any xarray dataset, including hdf, from a server; again, there are other solutions that do something similar.
n5, which has a java and rust implementations, maybe more
z5 acts as a C++ and a python implementation for both zarr and N5
can z5 read/write directly to cloud stores
No, it's purely targeted at the file system format for both zarr and n5 as far as I know.
@ryan-williams there is already a bit of an ecosystem (albeit one tightly constrained to one institute...) rapidly evolving around the java N5 implementation, including a high-performance 3D data viewer, some image registration tools, and a volumetric image annotation suite. The java N5 already supports a number of backends, including the N5 filesystem format, HDF5, google cloud, and AWS (take a look here). It might make sense for a JVM implementation of the zarr file system format to take the form of an N5 backend (initially, at least) - that would potentially give all of those other tools access to zarr datasets for free, as well as saving you writing some of the higher-level boilerplate. That's if you're happy with the API, of course.
My feeling is that zarr has more momentum behind it and will have more impact in the future. Convergence would be great, but if the N5 tool ecosystem could get access to zarr file system arrays for free, that could also solve the problem.
I'd be really happy if Zarr and N5 converged on the same spec. It would make it much easier for people in this problem domain to collaborate more effectively on many other common challenges.
checking in here after a long gap!
I'm far along with a Zarr implementation in Scala, which will address the "JVM implementation" request here.
Some notes:
h5py
can't do direct cloud IOLooking forward to sharing more info on this shortly!
Very exciting!
On Fri, 28 Sep 2018, 05:21 Ryan Williams, notifications@github.com wrote:
checking in here after a long gap!
I'm far along with a Zarr implementation in Scala, which will address the "JVM implementation" request here.
Some notes:
- it's in a branch that I am aggressively cleaning up atm; I'll send a link by Monday, but wanted to just mention now since other relevant discussions are ongoing.
- as one concrete use: I can directly convert HDF5 files to Zarr in "the cloud"
- currently: S3 or GCS (via Java NIO APIs; ABS doesn't have an NIO impl yet https://github.com/Azure/azure-storage-java/issues/305#issuecomment-391806835 )
- AFAIK that's not otherwise possible today:
- h5py can't do direct cloud IO https://github.com/h5py/h5py/issues/925
- various FUSE-based workarounds are brittle https://github.com/dask/gcsfs/issues/107 or missing features https://github.com/GoogleCloudPlatform/gcsfuse/issues/286.
- @tomwhite https://github.com/tomwhite added an NIO read-path https://github.com/tomwhite/hdf5-java-cloud/blob/master/src/main/java/com/tom_e_white/hdf5_java_cloud/NioReadOnlyRandomAccessFile.java to the netCDF Java lib https://github.com/Unidata/thredds, and that's what I use, along with my JVM Zarr impl, to do the conversion
- Incidentally, this Scala implementation will also provide a javascript implementation "for free", via scala.js https://www.scala-js.org/
- I'm hoping to also compile it to native, via scala-native https://github.com/scala-native/scala-native, but that's a at least another 6mos out (other libraries need to support scala-native first https://github.com/typelevel/cats/issues/1549)
Looking forward to sharing more info this shortly!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr/issues/286#issuecomment-425316837, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QqiZS87xRLkgJ7qxg5tUl9YblnQSks5ufaPIgaJpZM4Vqj9v .
Excellent! I would love to build off of this work on the netCDF-Java side to provide an IOSP to Zarr (read Zarr into the Common Data Model). At that point, we could enable the THREDDS Data Server to serve data stored in Zarr :-)
Would you be open to that idea, and does the license permit such usage?
@lesserwhirls yea, it will be Apache-2.0 licensed, happy to have it feed into netCDF things!
It might be helpful/less painful for everyone if we get the changes made to netCDF-Java
made upstream. @tomwhite - would you be willing to contribute those changes?
@lesserwhirls, yes I'd be happy to. I'll open an issue/PR to discuss.
Hi @ryan-williams how is it going?
I'm far along with a Zarr implementation in Scala, which will address the "JVM implementation" request here.
Some notes:
- it's in a branch that I am aggressively cleaning up atm; I'll send a link by Monday, but wanted to just mention now since other relevant discussions are ongoing.
hello! I've been side-tracked, but what I have is here lasersonlab/ndarray.scala. it's pretty "alpha" still, and the issues reasonably capture the things I'm focused on next.
I'll be checking back in on this in the coming weeks, and will give some more updates here.
Just ran across https://github.com/bcdev/jzarr/blob/master/docs/tutorial.rst
cc: @SabineEmbacher
see https://jzarr.readthedocs.io/en/latest/
hugs Sabine
Am 30.03.2020 um 14:37 schrieb Josh Moore:
Just ran across https://github.com/bcdev/jzarr/blob/master/docs/tutorial.rst
cc: @SabineEmbacher https://github.com/SabineEmbacher
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/community/issues/15#issuecomment-605973569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABTBCETA3MRAHO5WJH76ZTRKCHCDANCNFSM4H5MBR2Q.
If you need array objects which behave almost like NumPy arrays you also can wrap the data using ND4J INDArray from deeplearning4j.org. You can find examples in the data writing and reading examples.
https://jzarr.readthedocs.io/en/latest/tutorial.html#writing-and-reading-data
Or directly in the code example https://github.com/bcdev/jzarr/blob/master/docs/examples/java/Tutorial_rtd.java#L41
Can any of you tell me how to register the jzarr java library to the maven central repository. I've never done this before. Does any of you have the time to guide or support me?
Best Regards Sabine
Hi @SabineEmbacher. I don't remember what HOWTO we followed originally for our jars (cc: @sbesson) but https://stackoverflow.com/questions/28846802/how-to-manually-publish-jar-to-maven-central looks reasonable enough. The biggest hurdles I remember are (1) proving that you own your groupId (*.bc.com
) and (2) making sure that all of your dependencies are accessible from maven central. I've created https://github.com/bcdev/jzarr/issues/4 since this may become protracted, but certainly happy to help. ~Josh
Following-up on https://github.com/zarr-developers/community/issues/15#issuecomment-610345738, the process used by OME for releasing some of its Java components to Sonatype is documented here with the relevant links to OSSRH in case it's useful. If possible, big :+1: for having jzarr
available from Maven Central.
alimanfoo commented on 1 Aug 2018
JVM implementation of Zarr would be very cool, particularly if it had the same flexibility as the Python implementation to plug in different storage back-ends including cloud object stores.
Did you see the example of how to read and write to Amazon AWS S3 cloud storage using JZarr? See: https://jzarr.readthedocs.io/en/latest/amazonS3.html and code example https://github.com/bcdev/jzarr/blob/master/docs/examples/java/S3Array_nio.java
Completely missed this thread but wanted to mention that https://github.com/saalfeldlab/n5-zarr implements https://zarr.readthedocs.io/en/stable/spec/v2.html as an N5 backend since September 2019. This way it is available for array processing with ImgLib2 https://github.com/saalfeldlab/n5-imglib2 which has no size limits and built in memory caching, and is also the native data library for BigDataViewer and a bunch of processing tools that we use and build. n5-zarr includes blosc compression and locking and is included in the standard distribution of https://fiji.sc/. With the N5-API, talking to Zarr, N5, HDF5 is all the same.
There is currently no official cloud backend (other than through FS wrappers) for N5-Zarr because we haven't yet separated the interfaces for store and translation layers, i.e. writing a backend for HDF5 or Zarr is entangled with writing a backend for another store (like the AWS and GoogleCloud stores for N5). I remember that there was a fork that copied the n5-aws-s3 logic into n5-zarr as a temporary solution @joshmoore wasn't that you who did this?
I remember that there was a fork that copied the n5-aws-s3 logic into n5-zarr as a temporary solution @joshmoore wasn't that you who did this?
Yup, see https://github.com/saalfeldlab/n5-aws-s3/issues/10 and https://github.com/saalfeldlab/n5-zarr/pull/5
Yup. It then got copied into the bdv/mobie code base for @tischi's I2K work. Having a way to unblock all of that would be great. (Note: I only copied-n-pasted the reader side of things. Writing still needs work as far as I know.)
As with the rust focus during the Feb. 10th meeting, there may be a Java-leaning to the upcoming call this Wednesday if anyone is interested in joining to chat.
cc: @SabineEmbacher @axtimwalde @DennisHeimbigner @WardF
Thanks @joshmoore! I'll be there. Looking forward to seeing you all.
There isn't one, is there?
I've started making one, will post updates here.