zarr-developers / community

An open community with an interest in developing and using new technologies for tensor data storage.
18 stars 1 forks source link

JVM Zarr implementation? #15

Open ryan-williams opened 6 years ago

ryan-williams commented 6 years ago

There isn't one, is there?

I've started making one, will post updates here.

martindurant commented 6 years ago

In #285 there was mention of n5, which has a java and rust implementations, maybe more. n5 is similar in concept to zarr, apparently.

jakirkham commented 6 years ago

N5 is basically this. The specs differ a bit in minor ways. Convergence would be good to have. Some relevant discussion in issue ( https://github.com/zarr-developers/zarr/issues/231 ).

ref: https://github.com/saalfeldlab/n5

alimanfoo commented 6 years ago

JVM implementation of Zarr would be very cool, particularly if it had the same flexibility as the Python implementation to plug in different storage back-ends including cloud object stores.

ryan-williams commented 6 years ago

Thanks for all the pointers! I've looked a bit at n5 and z5; a couple questions:

martindurant commented 6 years ago

I'm not aware of a way to read HDF5 from cloud stores in python

gcsfs's FUSE module does allow this, and there are other FUSE solutions out there too. The implementation is not at all performance compared to zarr. In addition, https://github.com/ContinuumIO/intake-xarray will shortly allow streaming of any xarray dataset, including hdf, from a server; again, there are other solutions that do something similar.

clbarnes commented 6 years ago

n5, which has a java and rust implementations, maybe more

z5 acts as a C++ and a python implementation for both zarr and N5

can z5 read/write directly to cloud stores

No, it's purely targeted at the file system format for both zarr and n5 as far as I know.

@ryan-williams there is already a bit of an ecosystem (albeit one tightly constrained to one institute...) rapidly evolving around the java N5 implementation, including a high-performance 3D data viewer, some image registration tools, and a volumetric image annotation suite. The java N5 already supports a number of backends, including the N5 filesystem format, HDF5, google cloud, and AWS (take a look here). It might make sense for a JVM implementation of the zarr file system format to take the form of an N5 backend (initially, at least) - that would potentially give all of those other tools access to zarr datasets for free, as well as saving you writing some of the higher-level boilerplate. That's if you're happy with the API, of course.

My feeling is that zarr has more momentum behind it and will have more impact in the future. Convergence would be great, but if the N5 tool ecosystem could get access to zarr file system arrays for free, that could also solve the problem.

jakirkham commented 6 years ago

I'd be really happy if Zarr and N5 converged on the same spec. It would make it much easier for people in this problem domain to collaborate more effectively on many other common challenges.

ryan-williams commented 6 years ago

checking in here after a long gap!

I'm far along with a Zarr implementation in Scala, which will address the "JVM implementation" request here.

Some notes:

Looking forward to sharing more info on this shortly!

alimanfoo commented 6 years ago

Very exciting!

On Fri, 28 Sep 2018, 05:21 Ryan Williams, notifications@github.com wrote:

checking in here after a long gap!

I'm far along with a Zarr implementation in Scala, which will address the "JVM implementation" request here.

Some notes:

Looking forward to sharing more info this shortly!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr/issues/286#issuecomment-425316837, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QqiZS87xRLkgJ7qxg5tUl9YblnQSks5ufaPIgaJpZM4Vqj9v .

lesserwhirls commented 6 years ago

Excellent! I would love to build off of this work on the netCDF-Java side to provide an IOSP to Zarr (read Zarr into the Common Data Model). At that point, we could enable the THREDDS Data Server to serve data stored in Zarr :-)

Would you be open to that idea, and does the license permit such usage?

ryan-williams commented 6 years ago

@lesserwhirls yea, it will be Apache-2.0 licensed, happy to have it feed into netCDF things!

lesserwhirls commented 6 years ago

It might be helpful/less painful for everyone if we get the changes made to netCDF-Java made upstream. @tomwhite - would you be willing to contribute those changes?

tomwhite commented 6 years ago

@lesserwhirls, yes I'd be happy to. I'll open an issue/PR to discuss.

aluhamaa commented 5 years ago

Hi @ryan-williams how is it going?

I'm far along with a Zarr implementation in Scala, which will address the "JVM implementation" request here.

Some notes:

  • it's in a branch that I am aggressively cleaning up atm; I'll send a link by Monday, but wanted to just mention now since other relevant discussions are ongoing.
ryan-williams commented 5 years ago

hello! I've been side-tracked, but what I have is here lasersonlab/ndarray.scala. it's pretty "alpha" still, and the issues reasonably capture the things I'm focused on next.

I'll be checking back in on this in the coming weeks, and will give some more updates here.

joshmoore commented 4 years ago

Just ran across https://github.com/bcdev/jzarr/blob/master/docs/tutorial.rst

cc: @SabineEmbacher

SabineEmbacher commented 4 years ago

see https://jzarr.readthedocs.io/en/latest/

hugs Sabine


Sabine Embacher Brockmann Consult GmbH phone: +49 (0)40 69 63 89 - 330 email: sabine.embacher@brockmann-consult.de skype-id: sabine.embacher.bc

Brockmann Consult GmbH Chrysanderstr. 1 D-21029 Hamburg, Germany Amtsgericht Hamburg HRB 157689 Geschäftsführer Dr. Carsten Brockmann Web: www.brockmann-consult.de Twitter: @BrockmannCon

Am 30.03.2020 um 14:37 schrieb Josh Moore:

Just ran across https://github.com/bcdev/jzarr/blob/master/docs/tutorial.rst

cc: @SabineEmbacher https://github.com/SabineEmbacher

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/community/issues/15#issuecomment-605973569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABTBCETA3MRAHO5WJH76ZTRKCHCDANCNFSM4H5MBR2Q.

SabineEmbacher commented 4 years ago

If you need array objects which behave almost like NumPy arrays you also can wrap the data using ND4J INDArray from deeplearning4j.org. You can find examples in the data writing and reading examples.

https://jzarr.readthedocs.io/en/latest/tutorial.html#writing-and-reading-data

Or directly in the code example https://github.com/bcdev/jzarr/blob/master/docs/examples/java/Tutorial_rtd.java#L41

SabineEmbacher commented 4 years ago

Can any of you tell me how to register the jzarr java library to the maven central repository. I've never done this before. Does any of you have the time to guide or support me?

Best Regards Sabine

joshmoore commented 4 years ago

Hi @SabineEmbacher. I don't remember what HOWTO we followed originally for our jars (cc: @sbesson) but https://stackoverflow.com/questions/28846802/how-to-manually-publish-jar-to-maven-central looks reasonable enough. The biggest hurdles I remember are (1) proving that you own your groupId (*.bc.com) and (2) making sure that all of your dependencies are accessible from maven central. I've created https://github.com/bcdev/jzarr/issues/4 since this may become protracted, but certainly happy to help. ~Josh

sbesson commented 4 years ago

Following-up on https://github.com/zarr-developers/community/issues/15#issuecomment-610345738, the process used by OME for releasing some of its Java components to Sonatype is documented here with the relevant links to OSSRH in case it's useful. If possible, big :+1: for having jzarr available from Maven Central.

SabineEmbacher commented 4 years ago

alimanfoo commented on 1 Aug 2018

JVM implementation of Zarr would be very cool, particularly if it had the same flexibility as the Python implementation to plug in different storage back-ends including cloud object stores.

Did you see the example of how to read and write to Amazon AWS S3 cloud storage using JZarr? See: https://jzarr.readthedocs.io/en/latest/amazonS3.html and code example https://github.com/bcdev/jzarr/blob/master/docs/examples/java/S3Array_nio.java

axtimwalde commented 3 years ago

Completely missed this thread but wanted to mention that https://github.com/saalfeldlab/n5-zarr implements https://zarr.readthedocs.io/en/stable/spec/v2.html as an N5 backend since September 2019. This way it is available for array processing with ImgLib2 https://github.com/saalfeldlab/n5-imglib2 which has no size limits and built in memory caching, and is also the native data library for BigDataViewer and a bunch of processing tools that we use and build. n5-zarr includes blosc compression and locking and is included in the standard distribution of https://fiji.sc/. With the N5-API, talking to Zarr, N5, HDF5 is all the same.

There is currently no official cloud backend (other than through FS wrappers) for N5-Zarr because we haven't yet separated the interfaces for store and translation layers, i.e. writing a backend for HDF5 or Zarr is entangled with writing a backend for another store (like the AWS and GoogleCloud stores for N5). I remember that there was a fork that copied the n5-aws-s3 logic into n5-zarr as a temporary solution @joshmoore wasn't that you who did this?

bogovicj commented 3 years ago

I remember that there was a fork that copied the n5-aws-s3 logic into n5-zarr as a temporary solution @joshmoore wasn't that you who did this?

Yup, see https://github.com/saalfeldlab/n5-aws-s3/issues/10 and https://github.com/saalfeldlab/n5-zarr/pull/5

joshmoore commented 3 years ago

Yup. It then got copied into the bdv/mobie code base for @tischi's I2K work. Having a way to unblock all of that would be great. (Note: I only copied-n-pasted the reader side of things. Writing still needs work as far as I know.)

joshmoore commented 3 years ago

As with the rust focus during the Feb. 10th meeting, there may be a Java-leaning to the upcoming call this Wednesday if anyone is interested in joining to chat.

cc: @SabineEmbacher @axtimwalde @DennisHeimbigner @WardF

axtimwalde commented 3 years ago

Thanks @joshmoore! I'll be there. Looking forward to seeing you all.