Maintainership roundtable and discussion

rust-ndarray / ndarray

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

https://docs.rs/ndarray/

Apache License 2.0

3.57k stars 304 forks source link

Maintainership roundtable and discussion #1272

Open DeliciousHair opened 1 year ago

DeliciousHair commented 1 year ago

I'm just looking at the activity level in terms of PRs being merged, wondering if this project is still a thing?

fre-hu commented 1 month ago

Ndarray’s layouts are fully strided. The rank can be either static or generic. Wouldn’t it be possible to add such a general (but less efficient) layout to mdarray, while maintaining the other more static layouts?

Then we would have a library that could accept any array from Numpy say, but algorithms could be still implemented in Rust for specific layouts. Fallible conversions would be proposed between the different layouts.

Not sure how cumbersome the resulting library would have to be. Hopefully one could profit from the strengths of Rust’s packaging by limiting the content of the basic library to infrastructure, and keep actual algorithms in separate exchangeable crates.

I think the simplest way is to add dynamic rank as a new shape type and keep the existing layout types. The shape types for static rank are tuples of dimensions (each static or dynamic), and the new type will instead consist of a Box/Vec. The resulting layout mapping will then use Box/Vec for both shape and strides.

There can be limitations and for some operations like creating array views and permuting dimensions the rank must be static. But yes you can always convert to static rank for calculations.

Is your choice motivated by BLAS/LAPACK being (marginally) more efficient for column-major data?

Do I understand correctly that mdarray is column major in the sense that the restricted layouts are column major? But the fully strided layout can accept any (fixed rank) strided array, right? Right now in Rust we cannot have a fully generic ndspan like in C++, but it should be possible to have a set of useful layouts for both column-major and row-major within a single library, or do you see a problem with this?

The choice is only to have a convention, and then column major is common for linear algebra. It is used both for memory layout and to give the order of dimensions in iteration.

Using strided layout with row major data will work, but operations that depend on iteration order will have worse access pattern. It works fine for interfacing though, and internally one could make a copy or reverse indicies.

To have full support for both row and column major would require one more generic parameter for the order. I had it in an earlier version, both removed it as it made both the library and interface more complex. C++ mdspan gets around this since it is quite thin.

bluss commented 1 month ago

ndarray-linalg maintainership discussed in issue rust-ndarray/ndarray-linalg#381

strasdat commented 1 month ago

@akern40

I'd also consider "competitive advantage": what can ndarray do that others can't, and what are we leaving to others? For example, nalgebra seems to have statically-shaped and stack-allocated matrices down pretty pat. Seems like we shouldn't focus on that use case?

100%

The second reason is that it opens the door to potentially-powerful optimizations, like things that you can do if you absolutely know that your array is just a 2D matrix, ...

Possibly folks consider that not ergonomic or at least usual, but you basically can do that already by just building dynamic (ndarray) tensors of static tensors. I prototyped that a bit here:

https://github.com/sophus-vision/sophus-rs/blob/2cb11381710c2cdc5cadf34b5212eef9cd554586/crates/sophus_core/src/tensor/arc_tensor.rs#L326

You can even have the scalar type be a std::simd::Simd type, or an nalgebra matrix of std::simd::Simd's.

akern40 commented 1 month ago

Ok, money where my mouth is! There is now[^1] code in the new-impl branch of my ndarray fork which starts the implementation of the new core design that I mentioned 3 weeks ago. That code:

In the src/core folder, defines the very bare bones structure of types, deref implementations, and traits that would constitute a common format for just about any multidimensional array in Rust. I think we'd eventually want this to be its own crate, once its significantly more mature. This design is very slim and, as a result, should hopefully be able to fit all of the above requests.
In the core.rs file, uses those types to redefine ndarray's core types. Significant work will be needed to endow those types with the behavior that ndarray currently has, but hopefully the outline of the sketch is visible.

Feedback is, as always, greatly appreciated!

[^1]: highly experimental, literally hot off the presses, take it with a grain of salt, etc

grothesque commented 1 month ago

Ok, money where my mouth is! There is now1 code in the new-impl branch of my ndarray fork which starts the implementation of the new core design that I mentioned 3 weeks ago.

Great, thanks! I cloned and also started looking at it. I see that it compiles and the tests run - but as far as I can see the tests do not touch the new code. Is it already possible to run something (even rudimentary), other than instantiating the new structures?

I finally managed to do a first read of your design document. (I wanted to first experiment and understand the inner workings of @fre-hu's mdarray crate which I believe I now finally do. (Do not hesitate to look at the issues I opened there (number 1 to 4).) It seems to me that the design you propose is in many ways similar to mdarray, which I think is a good thing, notably the bits relating to array references (called Span there) and array views (called Expr there).

In the design document you write about ndarray and constant dimensions:

That last case can already be handled by the dimensionality generic

Can you point me to the relevant part of the code, because from what I have seen so far ndarray's arrays always have dynamic shapes, i.e. individual elements of the shape are not part of the type.

@akern40:

I'd also consider "competitive advantage": what can ndarray do that others can't, and what are we leaving to others? For example, nalgebra seems to have statically-shaped and stack-allocated matrices down pretty pat. Seems like we shouldn't focus on that use case?

My favorite aspect of mdarray's design is how it allows to mix dynamic and compile-time shapes. See for example this comment. I think that this design allows to combine the strengths of ndarray and nalgebra in a single crate, and I do not see a reason why this approach could not be adopted by a redesign of ndarray. Any thoughts on this?

edgarriba commented 3 weeks ago

Feedback is, as always, greatly appreciated!

@akern40 in the proposed design, it would be great to include the ability to adopt straight away different backends such as recent popular crates like apache arrow-rs which the community is adopting quite fast and has a slim api for custom allocators which can open doors to hold not only cpu storage, but also cuda, wpgu, etc. For this reason, I recently dropped ndarray from main core of a crate I maintain for computer vision and deep learning moving towards to my own much simpler Tensor struct based on arrow::Buffer. See: https://github.com/kornia/kornia-rs/blob/main/crates/kornia-core/src/tensor.rs One extra reason for me was to have a lightweighted crate without all the bells and whistles of ops, axis iterators, etc in order to keep it very minimal.

akern40 commented 3 weeks ago

Is it already possible to run something (even rudimentary), other than instantiating the new structures?

Not yet, this is very preliminary, just lays out what the fundamental data structures would look like. My next step is working on an implementation path forward that is as backwards-compatible as possible.

It seems to me that the design you propose is in many ways similar to mdarray, which I think is a good thing, notably the bits relating to array references (called Span there) and array views (called Expr there).

I think that's good as well! I didn't look too closely at mdarray's design, but I believe we both took heavy influence from C++ mdspan, so that makes sense.

In the design document you write about ndarray and constant dimensions:

That last case can already be handled by the dimensionality generic

Can you point me to the relevant part of the code, because from what I have seen so far ndarray's arrays always have dynamic shapes, i.e. individual elements of the shape are not part of the type.

Ah sorry that line is supposed to indicate that you don't need another / different generic to handle constant dimensions, you could build that into the Dimension or broader Layout generic; not to say that ndarray already handles this.

My favorite aspect of mdarray's design is how it allows to mix dynamic and compile-time shapes. See for example this comment. I think that this design allows to combine the strengths of ndarray and nalgebra in a single crate, and I do not see a reason why this approach could not be adopted by a redesign of ndarray. Any thoughts on this?

Absolutely agreed here - nalgebra actually has this capability as well. Coming from my very first introduction to the Rust numeric computing space, it's kind of essential for performance, because you can specialize for state matrices that have a dynamic number of entries, each of a fixed size (e.g., 3/6/9/12 for a 3-dimensional state).

akern40 commented 3 weeks ago

@akern40 in the proposed design, it would be great to include the ability to adopt straight away different backends

Backend flexibility is a major goal of the current design! I'm curious - when you say "adopt straight away", what do you mean by that? As in, you'd like to see an Arrow-based backend included as a first-class supported ndarray type?

I recently dropped ndarray from main core of a crate I maintain for computer vision and deep learning moving towards ... a lightweighted crate without all the bells and whistles of ops, axis iterators, etc in order to keep it very minimal.

Is the idea here to just have a type that you can use for storage? Say there existed an ndarray-core which you could depend on, and which gave you a generic type that you could use for storage. Would the advantage be easy conversion to arrays, for when people need them? In other words, what is ndarray without the bells and whistles? Or am I totally off base here?