zarr-developers / community

An open community with an interest in developing and using new technologies for tensor data storage.
18 stars 1 forks source link

MATLAB implementation of Zarr #16

Open jakirkham opened 6 years ago

jakirkham commented 6 years ago

Would be very useful to have a MATLAB implementation of Zarr. Opening this issue to connect with others interested in this problem. Also to determine the best approach for implementing this (e.g. pure MATLAB, C/C++ with MEX bindings, etc.).

martindurant commented 6 years ago

Would a low-level implementation (rust??) be a good step, to open zarr to multiple platforms?

jakirkham commented 6 years ago

Was thinking C/C++ as the low level implementation since it is pretty easy to build on many platforms (C in particular). Though we already have a C++ implementation in Z5. So that might be the easiest way to start.

That said, not everyone knows low level languages like this. So there are some downsides in merely binding to a low level implementation as not as many people can collaborate on that part.

aparamon commented 6 years ago

A cross-platform C implementation in form of (dynamic-link) library opens door not only for C and C++ clients, but also grants access from virtually any native language. C header file language of types, structs and function declarations is one of the most universal programming interfaces in use today (along with text streams and probably REST), and is definitely a must for efficient data format library.

If it's implemented, I promise to provide Delphi language bindings ;-) Please note that it's recommended to avoid C-specific features, like #defines, varargs as those are not always possible to translate.

jakirkham commented 6 years ago

Just so we are on the same page, I'm +1 on a C/C++ implementation. In fact there is already a C++ implementation, which we could pretty easily wrap in a C API layer. So this should be easily achieved. Would follow up with the z5 developers on getting C bindings to the C++ library if that is interesting.

The question is whether the MATLAB implementation should be using the C/C++ implementation with MEX bindings or whether it should be implemented natively in MATLAB. There are pros and cons to both approaches.

aparamon commented 6 years ago

There's no harm that eventually it has both. Let a hundred flowers bloom! The most suited will survive.

constantinpape commented 6 years ago

Regarding a C wrapper for z5: This should not be to difficult.

For some context, the major part of the API is implemented in the Dataset/ DatasetTyped class: https://github.com/constantinpape/z5/blob/master/include/z5/dataset.hxx#L101

Wrapping this should be straight-forward (just require quite a bit of code...). The chunk I/O of this function has a pointer API. In addition, there is a pure function API to provide access to multi-arrays, e.g. implemented for xtensor (which is used for the pythonbindings) here: https://github.com/constantinpape/z5/blob/master/include/z5/multiarray/xtensor_access.hxx

For the C wrapper it would probably be a good idea to re-implement this for C-arrays. If there is more interest into this, let me know and we can open a separate issue in z5.

jakirkham commented 6 years ago

Let's go ahead and open that issue. Would give people a place to discuss this other than the MATLAB support issue. 😉

constantinpape commented 6 years ago

Here is the z5 issue. https://github.com/constantinpape/z5/issues/68

Any input or contribution is welcome.

clbarnes commented 6 years ago

@martindurant on the theme of "zarr and N5 are basically the same thing", there is a rust implementation of a subset of N5's features written by @aschampion here https://github.com/aschampion/rust-n5 , which could be useful as a reference if a rusty zarr implementation became a goal.

tbenst commented 4 years ago

This is one of the top results on google--is there a way to read Zarr in Matlab yet? Would love to switch from HDF5 to Zarr, but need interop with other languages. Thanks for your work on this!

DrKenHo-crick commented 3 years ago

Hi @constantinpape and @jkh1, great to meet up virtually last week at the ome-ngff meeting. I mentioned that I am interested in looking into MATLAB ome-zarr implementation as I have a project using ScanImage, so using matlab to write directly to ome-zarr seems to be ideal. From the discussion I understand that there are a few routes to acheive that, https://github.com/zarr-developers/zarr_implementations (C: netcdf, C++: xtensor-zarr, z5). I shall take a look into those. If I have missed anything, please leave let me know. Thanks

constantinpape commented 3 years ago

https://github.com/zarr-developers/zarr_implementations (C: netcdf, C++: xtensor-zarr, z5).

Yes, I think these are all the available C/C++ implementations.

normanrz commented 2 years ago

I think the most Matlab-y implementation would be an adapter for the new blockedImage abstraction. The adapter could be built with one of the underlying C/C++ implementations.

DrKenHo-crick commented 2 years ago

Presumably the blockedImage will be treated as a chunk in Zarr.

normanrz commented 2 years ago

Presumably the blockedImage will be treated as a chunk in Zarr.

Not sure, if I understand correctly. The blockedImage will use the chunks internally. The neat thing about the blockedImage is that it can abstract away the notion of chunks. So, from a user point of view any range of the data can be read or written, and the abstraction will take care of mapping the reads/writes to chunks.

MSanKeys963 commented 1 year ago

Hi everyone. Dropping this here if anyone would like to participate.

From: Joelle Perez Howlett [usabilityrecruiting@mathworks.com](mailto:usabilityrecruiting@mathworks.com) Sent: Thursday, January 19, 2023 7:09 PM Subject: Participate in MathWorks Feedback Session - Zarr File Format

We are looking for people who want to use MATLAB to read and/or write scientific data and attributes from/to Zarr files to participate in a usability research session. Are you, or do you know anyone who might be, interested in participating?

The session will be on a MathWorks computer remotely accessed using Microsoft Teams and will take approximately two hours. The time is flexible, and we can schedule it for a time that works best for you. You will be offered our standard $100 stipend for participation.

If you’re interested in participating or have questions, please respond to this e-mail with answers to the following brief questions in-line with the dashes. Please note: you do not need to answer yes to all the questions to qualify.

  1. Do you use the Zarr file format? A. Yes, I read data from Zarr files B. Yes, I write data into Zarr files C. Yes, I read and write Zarr files D. No

  2. What is your MATLAB programming skill level? A. Beginner B. Intermediate C. Advanced D. Expert

  3. Would you like to read and/or write Zarr files using MATLAB? A. Yes B. No

  4. Do you have experience working with any of the following scientific data formats? A. NetCDF B. HDF5 C. HDF4/HDF-EOS D. CDF E. FITS F. Other, please specify

What is usability research? Usability research enables MathWorks developers to talk to users of MathWorks products to gather feedback on what works well and what could work better. During the session we will ask you to work through typical tasks and give us your feedback.

Thank you,

Joelle Perez Howlett Associate UX Researcher MathWorks 1 Apple Hill Drive Natick, MA 01760 508-433-5555 usabilityrecruiting@mathworks.com

You can sign-up for this session by sending an email to usabilityrecruiting@mathworks.com. Let me know if you need help with this. Thanks!