openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
694 stars 36 forks source link

[REVIEW]: faer: A linear algebra library for the Rust programming language #6099

Open editorialbot opened 7 months ago

editorialbot commented 7 months ago

Submitting author: !--author-handle-->@sarah-ek<!--end-author-handle-- (Sarah El Kazdadi) Repository: https://github.com/sarah-ek/faer-rs Branch with paper.md (empty if default branch): Version: v0.15.0 Editor: !--editor-->@jedbrown<!--end-editor-- Reviewers: @fclc, @mhoemmen Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/c88fc8e2226265a25705d80c8b42313f"><img src="https://joss.theoj.org/papers/c88fc8e2226265a25705d80c8b42313f/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/c88fc8e2226265a25705d80c8b42313f/status.svg)](https://joss.theoj.org/papers/c88fc8e2226265a25705d80c8b42313f)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@fclc & @mhoemmen, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jedbrown know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @mhoemmen

📝 Checklist for @FCLC

editorialbot commented 7 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 7 months ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.25 s (475.2 files/s, 457039.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Rust                            97           9275           5579          99433
TOML                            12            100              1            572
Markdown                         5            118              0            506
C++                              1             34              8            388
YAML                             3             21              1            115
TeX                              1              0              0            114
Python                           1             21              0             75
HTML                             1              0              0             15
-------------------------------------------------------------------------------
SUM:                           121           9569           5589         101218
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 7 months ago

Wordcount for paper.md is 746

editorialbot commented 7 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/2503210.2503219 is OK
- 10.1145/2382585.2382587 is OK
- 10.1145/1391989.1391995 is OK
- 10.1016/b978-0-12-803761-4.00011-3 is OK

MISSING DOIs

- 10.31003/uspnf_r2491_01_01 may be a valid DOI for title: Rayon

INVALID DOIs

- None
editorialbot commented 7 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jedbrown commented 7 months ago

Hi, @FCLC @mhoemmen 👋 Welcome to JOSS and thanks for agreeing to review! The comments from @editorialbot above outline the review process, which takes place in this thread (possibly with issues filed in the faer repository). I'll be watching this thread if you have any questions.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention this issue so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within a month or two. Please let me know if you require some more time. We can also use editorialbot to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@jedbrown) if you have any questions/concerns.

mhoemmen commented 6 months ago

Review checklist for @mhoemmen

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

mhoemmen commented 6 months ago

Notes on the above Checklist

Software paper

Summary

The summary is concise and explains the library well to me. For example, "dense linear algebra library" is explained as "matrix decompositions and solving linear systems." ("Linear algebra library" means different things to different people, so having the explanation is helpful.)

It would be interesting to know what inspired the name "faer."

A statement of need

The statement of need is compelling. The following quote conveys the motivation well.

Aside from faer, the Rust ecosystem lacks high performance matrix factorization libraries that aren’t C library wrappers, which presents a distribution challenge and can impede generic programming.

The "impede generic programming" phrase is enough justification for me. I've encountered C++ libraries that exist only for this reason, to make it possible to factor sparse matrices that aren't supported by popular C libraries.

State of the field

Do the authors describe how this software compares to other commonly-used packages?

The author briefly mentions and cites popular C, Fortran, and C++ (Eigen) software packages. However, it could be helpful to learn more about the library's interface design and how that compares to other packages. This is another way in which "linear algebra library" can mean different things to different people. For example, is it a "Matlab in Rust" (like the "Matlab in C++" packages Eigen and MatX), with overloaded operators and short functions, that doesn't worry so much about the cost of allocating temporary matrices or vectors? Is it a collection of stateless algorithms, like P1673 a.k.a. std::linalg (voted into the current C++26 draft)?

FCLC commented 6 months ago

Review checklist for @FCLC

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

FCLC commented 5 months ago

Apologies in the delays to review, only received permission from my employer to review as of this morning! I'll be looking to provide initial comments in the coming days and weeks.

@jedbrown did you have a publication date target?

jedbrown commented 5 months ago

Thanks @FCLC. We try to complete reviews within two months if no major changes are needed by the authors. It sounds like that'll be workable for you?

jedbrown commented 5 months ago

@mhoemmen :wave: How is your review going? I see you had a question about allocation tucked in the end of your comment. Faer supports no_std, which is an environment without allocation or a standard library. In that sense, it is low-level. I think you'll find ergonomics of the high-level interfaces are intermediate, albeit safe.

mhoemmen commented 5 months ago

@jedbrown Work has been a lot lately, so I've been running late on this one. Thanks for the clarifying comment! I'll finish this review as soon as I can.

jedbrown commented 3 months ago

@FCLC @mhoemmen :wave: Hi, could you let us know your timeline for finishing your review? Are there any hold-ups that Sarah or I could help with?

mhoemmen commented 3 months ago

@jedbrown just purely what I've discussed offline.... I will get to this as soon as I can.

FCLC commented 3 months ago

Reviewing the paper itself, I have a few comments, organized by line number (L{number})

L11: Worth clarifying that faer can operate on non symmetric Mats, as well as other categories it can and cannot natively handle? (Triangular? Hermitian?)

L14: Which standard do these floats adhere to? (presumably that's IEEE754 in either the 1985 or the 2008 flavour?). Complex types are mentioned, are they equivalent to the complex types defined in the reference BLAS? (they are later discussed at L28-31)

L15: The idea of universal type support is interesting, what can this be extended to? Integer maths? Fixed point? Posits? Fundamentally, could a user map any type that Rust/LLVM can use and have it be decently performant? Why or why not?

L18: SIMD is decently well documented online, but is not very well known outside of low level optimization circles; it may be helpful for higher level devs to provide the acronym if not a short description?

L22-26: I'd agree that a Rust native matrix library is useful, but can this rust library be actively built and used from C native projects? If so, does it share a common API/ABI (LAPACK, BLAS etc.?) to be useable from those other languages?

L24-26: Does the lack of a Rust ABI present portability challenges to faer? Is there an expectation that any users of faer will always have to rebuild faer from source between versions? If so, being as linear algebra libraries tend to be heavily leveraged in HPC contexts/environments, does the usage of rust mandate a change in approach to (sub)versioning?

L29: To help those not intimately familiar with performance engineering, is it worth adding a brief sentence explaining why contiguous memory is valuable? Even something as simple as a mention of strided/aligned memory accesses?

L38: How is CPU feature detection handled? AOT compiled multi dispatch for a list of targets, local target only compilation, etc.? Is this detection reliant on cargo or does it make use of other/homegrown methods?

L41-45: it might be worth having a glossary or spelling out what these terms mean?

L47: Brief explanation of what is meant by "Optimized fuzed kernels"?

L57-61: since most performance metric for BLAS and BLAS adjacent libraries tend to express performance in terms of floating point operations per second, is it worth providing a table comparing faer vs various other implementations in terms of (giga) floating point operations per second?

L61: {Graphs} F128 is mentioned, but not "which" f128 is this? is this IEEE FP128? IBM/PPC FP128? 2xFP64 packed FP128 (where the leading FP64 is IEEE754 Binary_64 with an extended 64 bits treated as a uint softfloat)?

Beyond that: is it worth having a statement/line indicating what platforms are supported? For example, is faer a linux x86-64 only library? what about Arm64? MacOS? Windows? FreeBSD? Does rust/cargo being limited in HPC enviroments cause concerns for long term support of versions? or are users expected to use cargo/spack/EasyBuild to stay on top of releases?

sarah-ek commented 3 months ago

i'll reply to the comments here for now, but let me know if i should also add these clarifications to the paper itself and rebuild

L11: could you explain what you mean by natively handle? there's algorithms for general, triangular, symmetric, hermitian matrices. (in the past year sparse matrix algorithms have also advanced quite a bit) the main ones that are missing afaik are banded matrices, and packed triangular/symmetric/hermitian storage, but that's not an inherent limitation of faer

L14: rust uses IEEE 754-2008 floating types for f32 and f64, complex types c32 and c64 are the same as float _Complex, double _Complex in C, but there's also a generic implementation for Complex<T> that can be used with different float types

L15: integer maths is unlikely, but the most interesting scenarios i can think of would be stuff like extended/multiprecision floating point types, similar to what mpfr implements. or things like interval arithmetic, forward autodiff types (dual/hyperdual numbers), and any possible combinations of such types (Complex<Dual<BigFloat>> should also work in most situations)

L18: i can add in a short description

L22-26: rust allows for interoperability with C by using the C abi on functions that are marked as extern "C" (commonly also used with #[no_mangle] so the symbol can be found by the linker as you'd expect) i've used faer from C and C++ in the past and it works quite well

L24-26: users are expected to rebuild when upgrading versions, yes. but there's nothing preventing us or other people from distributing precompiled static/dynamic library files exposing a C api.

L29: i can add in an explanation

L38: cpu detection is done for predetermined targets, but each type can specify which targets it cares about. for example if a user wants to work with f16 (half) matrices, they can specify that they only want to try to detect something like avx512fp16 on x86

L41-L45: sure thing. i'll add that to the paper

L47: in this context it refers to computing things like A x and A.T y simultaneously, only traversing A once, which can significantly speed up some memory-bound algorithms

L57-61: gflops/s works for matrix multiplication. but for more complex iterative algorithms like the eigenvalue decomposition, the flop count can depend on several factors and there's no closed form formula for it, so im not sure how to communicate that

L61: that's my bad, i should have clarified that it's f64x2 (commonly referred to as double double)

faer supports any platforms that rust itself supports. so it can run on the common operating systems, and the target architectures that rust supports

x86
x86_64
mips
powerpc
powerpc64
arm
aarch64

although vectorization is currently only supported on x86/x86_64 and aarch64 (no sve support at the moment), due to other platforms' intrinsics being currently unstable in rust

other than that, users are expected to use cargo to stay up to date, but they can build and distribute faer as a C api (still not currently exposed by faer itself, but users can easily expose the parts they care about) + static/dynamic library if the environment doesn't support building rust projects from source

FCLC commented 1 month ago

L11: If it can handle them all, might be worth a quick mention in the paper? As a means of differentiating with a lot of toy implementations that don't bother implementing anything that isn't: GEMM, GEMV, DOT, and AXPY. faer is very rich in its feature set, but I don't know that it comes across? It might just be my reading.

L:14: it might be worth mentioning that the library/rust uses 754:2008 in that case, especially for those users that might want to use rust as their math library with another language? The 2008 part in specific allows for floating point contraction explicitly, which some languages may not expect, but can effect things like bit pattern. (I'll leave the for better or worse discussion on FP contraction aside!)

L15: Interesting!

L18: sounds good!

L22-26 && L24-26: that's great news! Perhaps a line to that effect would be useful? It might be out of scope for this paper, but does using the extern C interface means a non rust expert could extern C there way all the way towards a fortran wrapper?

L29: Ack

L38: FP16 is interesting in how little it's supported in HW for only a very small subset of x86 targets (ignoring the f16c ISA extension). In the case of something not being supported in HW, does that fall back to an LLVM softfloat? (sort of related to L15)

L41-45: ack

L47: ack, maybe worth a quick line to that effect?

L57-61: Fair enough!

L61: Ack

L{end}: Does the lack of intrinsic support mean that FAER/Rust will not be competitive for those platforms in the short/medium term?

mhoemmen commented 3 weeks ago

Jed met with me today to go through the library and help me get started downloading it. I've had some "editor's block" about starting the review. My apologies to the author, who has waited far too long on it. I didn't feel ready because I didn't understand Rust well enough, but Jed helped me understand a tiny bit so that I wouldn't be completely lost. He also emphasized that I should focus on the library's functionality independent of whether the library is idiomatic Rust code.

mhoemmen commented 3 weeks ago

I'll add some remarks here from my conversation with Jed today.

Jed explained that Sarah wrote a clean-room reimplementation from the algorithms, rather than porting existing code to Rust or wrapping existing non-Rust libraries. This is an extraordinary effort. From an algorithm development perspective, it was comparable to that of developing EISPACK and LINPACK from the Handbook. From a software library development perspective, it was actually more effort, because the idioms for managing views of subsets of multidimensional arrays in languages like Fortran or C++ do not carry over naturally to Rust. This means that Sarah couldn't just port (say) LAPACK, because Rust doesn't come with a way to say things like "create a view of rows 10-15 and columns 20-29 of this matrix." As a result, Sarah had to develop her own data structures, such as a matrix container (a C++ word meaning something like "data structure owning an allocation") and mutable and immutable "reference" (C++ might say "subview") types. Along with this, Sarah had to come up with new idioms, like what an "in-place factorization" means. (Instead of "operate in place on a view of the matrix" as in LAPACK, it means "consume this matrix and produce a new matrix that possibly reuses the storage of the consumed matrix.")

Thanks to Jed's help, I was able to build (is that even the right word?) and run the library's tests. The experience was completely without hassle. The hardest part by far was convincing my Linux system to install Rust, and that wasn't even so hard. The test output was extensive and gave a tour of the library's features. It also showed that one test (the matrix multiply test) didn't run, because it would have taken too long on my system. Presumably it's possible to enable that test by hand.

mhoemmen commented 3 weeks ago

Regarding

Functionality: Have the functional claims of the software been confirmed?

I have to learn how to navigate the code repository in order to do this, because I don't know a priori how to tell where the tests live. There's no "test" directory, for example. git grep test has led me to src/lib.rs, which appears to be the main entry point to the library. I see extensive human-readable documentation comments and some code constructs like #[cfg(test)] and #[test]. I'm guessing that the latter are embedded tests.

jedbrown commented 3 weeks ago

Yeah, this is the pattern used for unit tests. It causes this code to be compiled out for regular use (in which case cfg(test) is false), but automatically put it in the test harness when running tests.

#[cfg(test)]
mod tests {
    #[test]
    fn test_name() {
        // ...
    }
}

You can also see how things are tested in CI (which includes running the test suite with code coverage) in .github/workflows/run-tests.yml. Note that there is also some static analysis/linting in CI.

mhoemmen commented 3 weeks ago

src/lib.rs documentation begins with matrix creation, slicing, and overloaded operators. The functionality looks familiar both to Matlab users and to users of comparable C++ libraries like Eigen.

I do wish the documentation had explained what this mat! thing is, though perhaps "foo! creates a Foo" is a Rust idiom with which I am unfamiliar. Jed went into some detail about how this is a macro that avoids creating a temporary to hold the literal input. I've written something just a bit comparable in C++ for the mdarray proposal (P1684), namely a nested initializer_list<initializer_list<value_type>> constructor.

Expressions like the following in the documentation appear to act like C++ expression templates, in that they are nonowning views of existing matrices that represent matrix arithmetic expressions. Rust doesn't have the dangling references problem that classical C++ expression templates have. Rust's syntax also makes it more clear (contrasted with the "auto return type problem" that e.g., Eigen has) that the type of something like add below is a reference, not a new allocation.

let add = &a + &b;
let sub = &a - &b;
let scale = scale(3.0) * &a;
let mul = &a * b.transpose();

I'm curious about evaluation of non-elementwise expressions like &a * b.transpose(). Is that an elementwise multiplication or the usual matrix multiply? The documentation says "* for either scalar or matrix multiplication depending on the types of the operands." If * can represent matrix-matrix multiplication, then what happens with expressions like &a * &b * &c? Does the implementation attempt to solve the matrix chain multiplication problem? Are these expressions always lazily evaluated? Do they need to allocate intermediate temporary storage, and if so, do they attempt to reuse such storage for the various products in the chain?

Regarding the indexing expression let a00 = a[(0, 0)]; in the documentation, Jed explained that Rust does not permit array indexing expressions to have more than one argument. I was happy to be involved in C++23's removal of that restriction, so that mdspan and mdarray can say e.g., a[0, 0]. This is a language restriction, not a library restriction, though.

mhoemmen commented 3 weeks ago

For the factorizations listed in src/lib.rs, I'll separate my discussion into "SVD or eigendecomposition" and "everything else" (the "one-sided" factorizations).

Regarding "everything else," I'm pleased to see some harder-to-implement but more interesting factorizations, like Bunch-Kaufman decomposition, QR with column pivoting (useful for applications like hierarchical matrices), and LU with full pivoting.

Regarding "SVD or eigendecomposition," it's a bold choice for a library to include those. These are much trickier to implement and test. Even LAPACK's eigendecompositions have evolved quite a bit in the last 15-20 years, including entirely new algorithms (e.g., MRRR) in the symmetric / Hermitian case. It's possible that LAPACK is the only source for well-maintained, well-tested implementations of the newer algorithms (though I haven't checked Eigen). It looks like faer has functionality comparable to LAPACK's "standard" drivers, either _geev, _syev, or _heev (all of which do QR iteration). Considering how difficult these algorithms are to implement and test, it's reasonable for a single-developer effort not to cover the full space of problems (e.g., generalized eigenproblem) and algorithms.

mhoemmen commented 2 weeks ago

I've been going through the tests. It's great to see both the usual random matrix tests, and special cases, including matrices of all NaNs.

Something that I miss, particularly for the SVD and eigendecompositions, is a discussion of test provenance. The experience of the QR algorithm is "we had to do these funny things to avoid known bad cases," so specific test cases matter a lot. For increasing test coverage, it may pay to go through the LAPACK test suite and the LAPACK Working Notes (LAWNs).