[REVIEW]: PyArmadillo: an alternative approach to linear algebra in Python

whedon commented 3 years ago

Submitting author: @jason-rumengan (Jason Rumengan) Repository: https://gitlab.com/jason-rumengan/pyarma Version: v0.500.2 Editor: @mjsottile Reviewer: @JaroslavHron, @uellue Archive: 10.5281/zenodo.5564389

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/9e0ec1df49e009869b5a308f6d319c71"><img src="https://joss.theoj.org/papers/9e0ec1df49e009869b5a308f6d319c71/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/9e0ec1df49e009869b5a308f6d319c71/status.svg)](https://joss.theoj.org/papers/9e0ec1df49e009869b5a308f6d319c71)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@JaroslavHron & @uellue, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mjsottile know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @JaroslavHron

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@jason-rumengan) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @uellue

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@jason-rumengan) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[ ] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @JaroslavHron, @uellue it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1137/1.9780898719604 is OK
- 10.1016/j.csda.2013.02.005 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.1007/978-3-319-32452-4 is OK
- 10.21105/joss.00026 is OK
- 10.1007/978-3-319-96418-8_50 is OK
- 10.1038/s41592-019-0686-2 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

mjsottile commented 3 years ago

@jason-rumengan I did a quick review of the paper and I would like you to expand it slightly - it currently lacks some detail I'd usually expect in a JOSS paper. I would recommend including a bit more detail about what the software provides and perhaps a short example illustrating its use. For example, one of the core goals of your submission is a more Matlab/Octave like experience, so simply including a small example showing the PyArmadillo code and the equivalent Numpy would help a reader quickly get an idea of the gist of the software.

If you look at the recently published JOSS papers (https://joss.theoj.org/papers/published) you can get a sense for the scale and scope of a typical accepted submission.

whedon commented 3 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=11.23 s (116.8 files/s, 31259.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C/C++ Header                   783          56263          20980         147037
HTML                             4           1466           1065          31626
C++                            287           7995           6951          29992
Python                         133           7204           5234          17253
reStructuredText                32           2578           2846           4413
CMake                           32            738            764           2812
YAML                            16            245            172           1034
Markdown                         9            334              0            926
SVG                              2              0              0            854
TeX                              1             19              0            120
XML                              1              0              0             69
Bourne Shell                     4             21             20             64
INI                              1              0              0             19
CSS                              1              0              0             11
make                             1             10              6             11
TOML                             3              1              0              9
MATLAB                           1              4              3              4
PowerShell                       1              3              5              4
-------------------------------------------------------------------------------
SUM:                          1312          76881          38046         236258
-------------------------------------------------------------------------------

Statistical information for the repository 'a7250d2ce6d3f42dee6ff94b' was
gathered on 2021/02/22.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Jason Rumengan                   3            91             54            0.01
Terry Zhuo                      74        221763          11413            8.58
conrad                           2            46             53            0.00
jason-rumengan                 447       1384413        1096872           91.30
terryyz                          2          1097             73            0.04
terryzhuo110                     1           191           1556            0.06

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Terry Zhuo                  196            0.1          1.0               43.37
conrad                        1            2.2          0.3                0.00
jason-rumengan           298735           21.6          1.0               11.04

uellue commented 3 years ago

Regarding "Statement of need":

NumPy and SciPy are the de-facto standard for numerical processing in Python and already implement all functionality in PyArmadillo, as far as I could see. Furthermore, the NumPy ndarray interface is the "lingua franca" for interoperability within a large ecosystem of Python packages. PyArmadillo seems to deliberately separate itself from that ecosystem. This ecosystem is a key benefit in data science and research software engineering and contributes to Python being so poular in this area: https://www.kdnuggets.com/2020/01/python-preferred-languages-data-science.html For that reason it should be made much clearer for which target audience in the Python world PyArmadillo is supposed to bring a benefit.

Being similar to Matlab/Octave is presented as a major advantage of PyArmadillo over NumPy/SciPy. However, when looking at https://pyarma.sourceforge.io/docs.html#syntax, the syntax and conventions of PyArmadillo seem closer to NumPy than to Matlab: square brackets for indexing, indices starting at 0, more object oriented than Matlab, judging from this example. As far as I know, Matlab and Octave already implement all capabilities of PyArmadillo. The paper and documentation should make clear for which users PyArmadillo would bring an advantage over Octave or Matlab.

The claim that NumPy and SciPy are overly verbose and cumbersome compared to Matlab should be substantiated, for example with code examples. I have already ported Matlab code to NumPy and also trained previous Matlab users in use of Python/NumPy. In my experience, the transition was very easy since most concepts and methods in Matlab have a 1:1 equivalent in the NumPy/SciPy ecosystem.

The supposed disadvantages of NumPy and SciPy can actually be perceived as advantages in many use cases. For that reason ist should be made clearer under which circumstances PyArmadillo may be favored and for which applications PyArmadillo is actually at a disadvantage because of these properties:

Being "more verbose" could actually be seen as an advantage for code readability, in particular when building larger applications.
Supporting n-dimensional arrays, as opposed to being limited to 2D and 3D, is of course crucial when dealing with data that is not 2D or 3D by nature. Examples from my personal experience are 4D scanning transmission electron microscopy. Being limited to 2D is seen as a limitation for scipy.sparse: https://docs.scipy.org/doc/scipy/reference/roadmap-detailed.html#sparse

Compared to both Matlab, and matplotlib as part of the NumPy ecosystem, PyArmadillo seems to be missing any plotting capabilities. Since plotting is an important feature for many users, it should be adressed in the PyArmadillo documentation.

In summary, the use cases where PyArmadillo would bring a benefit over Matlab, Octave and NumPy/SciPy should be substantiated. From the current information, the claims made in the paper are not supported.

uellue commented 3 years ago

@mjsottile @jason-rumengan this concludes my first round of review. :-)

conradsnicta commented 3 years ago

@mjsottile and @uellue Thank you for the constructive feedback and suggestions. We will revise the paper accordingly and respond in due course. (I'm one of the co-authors).

uellue commented 3 years ago

As a specific feedback, the fact that PyArmadillo matrices can be easily created from NumPy arrays and vice-versa could be highlighted in the paper and documentation. I only found that out by trying it. That could position PyArmadillo not so much as an alternative to NumPy and SciPy, but as an extension. It also means that existing tools like matplotlib can easily be used together with PyArmadillo. Still, it should become clear what specific benefits would arise from using PyArmadillo over the alternatives. Did you consider benchmarking some non-trivial linear algebra operations? In http://arma.sourceforge.net/speed.html, speed seems to be pitched as one of the advantages of the underlying Armadillo library.

JaroslavHron commented 3 years ago

I fully agree with previous comment by @mjsottile and the points from the review by @uellue.

The title "... an alternative approach to linear algebra in python" might be a bit misleading - alternative in what sense?. Something like "... Matlab/Octave-like interface for linear algebra in python" would sound more descriptive to me.
How essential is the Armadillo back end? If the interface style is the main selling point, could the same interface be implemented on top of eigen or even numpy?
Definitely a short example of usage should be included, some short part of the example.py from the repository.
I had an installation problem linking with openblas in nonstandard location. However that is probably more a problem of the armadillo library itself. I had to set USE_BLAS/LAPACK options in ext/armadillo/CMakeLists.txt, this could be mentioned in the installation instructions more explicitly in the section on MKL and other BLAS/LAPACK libs.
```
-set(ARMA_USE_LAPACK     false)
-set(ARMA_USE_BLAS       false)
+set(ARMA_USE_LAPACK     true)
+set(ARMA_USE_BLAS       true)
```

conradsnicta commented 3 years ago

@mjsottile For the revision of the paper, would it be okay to alter the title?
In order to address the 'misleading title' issue raised by @JaroslavHron, we are exploring changing the title to "PyArmadillo: a streamlined linear algebra library for Python".

mjsottile commented 3 years ago

@conradsnicta I am fine with modifying the title to more accurately reflect the content of the paper/package. I do not know what the process is to alter the title for the GitHub issues though.

@arfon Is there a way to change the title for the submission once the authors settle on a more accurate title? I didn't see anything obvious in the JOSS documentation for title changes.

arfon commented 3 years ago

@mjsottile - whatever the paper.md title is becomes the final title for the accepted and published submission.

whedon commented 3 years ago

:wave: @uellue, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @JaroslavHron, please update us on how your review is going (this is an automated reminder).

JaroslavHron commented 3 years ago

In order to address the 'misleading title' issue raised by @JaroslavHron, we are exploring changing the title to "PyArmadillo: a streamlined linear algebra library for Python".

Yes, that title gives much better idea what to expect from the library.

mjsottile commented 3 years ago

:wave: @conradsnicta This is partly a ping just to keep this thread from going stale. Be sure to post a note when you've made revisions and are ready for people to take a further look. No rush if you're busy with the day job (I'm in the same place lately...).

conradsnicta commented 3 years ago

@mjsottile Thanks for the ping. We're working on it, but hampered due to day jobs / studies and sudden covid-related lockdowns.

conradsnicta commented 3 years ago

@JaroslavHron Thank you for the considered feedback. We have revised the article to address the comments you have raised.

The title "... an alternative approach to linear algebra in python" might be a bit misleading - alternative in what sense?. Something like "... Matlab/Octave-like interface for linear algebra in python" would sound more descriptive to me.

We agree the title is misleading. We have changed the title to: "PyArmadillo: a streamlined linear algebra library for Python".

How essential is the Armadillo back end? If the interface style is the main selling point, could the same interface be implemented on top of eigen or even numpy?

There is a very close correspondence between the API provided by PyArmadillo (ie. exposed at the Python level) and the API provided by Armadillo at the C++ level. Armadillo by default provides a Matlab-like interface (which is the aim of PyArmadillo), while the APIs of other frameworks are notably further away from Matlab. The use of Armadillo hence reduces the amount of the "translation" code (between Python and C++), which in turn considerably helps with maintainability (ie. debugging, future expansions, etc).

While it is certainly possible to use other backends, it would involve considerably more work. Furthermore, we as the authors are very familiar with how Armadillo works (including internals), its constraints, etc, which is essential for creating a robust interface between C++ and Python. We are less familiar with other C++ frameworks.

Definitely a short example of usage should be included, some short part of the example.py from the repository.

We have included a short example (several lines) in the revised version of the article. See figure 1.

I had an installation problem linking with openblas in nonstandard location. However that is probably more a problem of the armadillo library itself. I had to set USE_BLAS/LAPACK options in ext/armadillo/CMakeLists.txt, this could be mentioned in the installation instructions more explicitly in the section on MKL and other BLAS/LAPACK libs.

Handling non-standard locations is a difficult problem. While the Cmake-based installer for Armadillo searches many paths in order to find OpenBLAS, this is by no means completely exhaustive. Manually changing the variables such as ARMA_USE_LAPACK and ARMA_USE_BLAS within CMakeLists.txt doesn't directly address to root cause of the problem, as the path for OpenBLAS is required. As we don't know the details of your specific setup (ie. the custom location of OpenBLAS), any changes would be done blindly without knowing whether this would fix the problem. We are happy to look at this Armadillo-specific issue at the Armadillo GitLab repo: https://gitlab.com/conradsnicta/armadillo-code/.

Nevertheless, we have partially addressed this issue by providing a pre-built version of PyArmadillo via the Python Package Index (pip). The pre-built version has been linked with OpenBLAS and has been tested to work on many Linux systems (as well as macOS and Windows). Please see https://pypi.org/project/pyarma/

(State of the field - relation to other commonly-used packages)

We have considerably extended and revised the Statement of Need, which better addresses the State of the field and provides reasoning on why PyArmadillo is useful. In short, the revised Statement now covers: (i) how Matlab is considered as a standard for prototyping involving numerical linear algebra, (ii) the problems with Matlab (licensing costs, restrictions on use, not suitable for general programs), (iii) how Python can address the problems with Matlab, (iv) how the different programming interface for NumPy/SciPy can hamper scientists/engineers transitioning from Matlab to the Python ecosystem, and (v) how PyArmadillo aims to facilitate the transition by providing an interface close to Matlab, wherever possible.
Please see the revised article for the full wording, which has details and nuances not covered by the summary here.

conradsnicta commented 3 years ago

@uellue Thank you for the considered feedback. We have revised the article to address the comments you have raised.

the fact that PyArmadillo matrices can be easily created from NumPy arrays and vice-versa could be highlighted in the paper and documentation. I only found that out by trying it. That could position PyArmadillo not so much as an alternative to NumPy and SciPy, but as an extension. It also means that existing tools like matplotlib can easily be used together with PyArmadillo.

We agree with this assessment. We have added the following statement to the paper: PyArmadillo matrices and cubes are convertible to/from NumPy arrays, allowing users to tap into the wider Python data science ecosystem, including plotting tools such as Matplotlib (Hunter 2007).

Furthermore, we added the following line to the documentation (doc/docs.html within the PyArmadillo repo): Plotting of PyArmadillo matrices is possible using matplotlib through conversion to a NumPy array

Still, it should become clear what specific benefits would arise from using PyArmadillo over the alternatives. (Statement of need - what problems the software is designed to solve) (State of the field - relation to other commonly-used packages)

We have considerably revised the Statement of Need, which now better covers the benefits provided by PyArmadillo within the context of other packages. Please see our response above to JaroslavHron for a summary of the changes (https://github.com/openjournals/joss-reviews/issues/3051#issuecomment-825427773) and the revised paper for details.

Did you consider benchmarking some non-trivial linear algebra operations?

We do not make performance claims within the paper. However, we note that in general, the speed of linear algebra operations is often highly dependent on matrix multiplication, and hence dependent on the quality of the implementation of BLAS routines (which are considered de-facto industry standard). To that end, PyArmadillo (indirectly) uses OpenBLAS where possible, which is known to have a performant (SIMD + multi-threading) implementation of matrix multiply. Furthermore, matrix decompositions such as SVD are also "farmed out" to the low-level routines present in OpenBLAS, which also provides a performant implementation of LAPACK.

uellue commented 3 years ago

@conradsnicta thank you for the updated version and the detailed feedback.

The paper is now a bit self-contradictory: You write "Recently the Python language has gained traction for scientific computing through add-on frameworks such as NumPy . In contrast to Matlab, Python is suitable for general programming..."

In the example you provide, one criticism seems to be the extra () in order to supply the shape to np.ones((2, 3)). In the context of the above statement, Python being a general purpose programming language, that is "not a bug, but a feature": In real-world applications, the shape is usually a free parameter and the number of dimensions is not fixed. Furthermore, the ones() function accepts additional parameters such as dtype and order: https://numpy.org/doc/stable/reference/generated/numpy.ones.html

Python would allow ones() to accept a list of integers as arguments and have keyword-only arguments. The definition would be

def ones(*args, dtype=None, order='C'): 
    ...

and it could be called as ones(2, 3, 4, 5, 6, ..., dtype=...). However, if the array shape is a free parameter, one would have to call it as np.ones(*shape) instead of np.ones(shape). All in all, the solution with variable agruments is more complex from a "general purpose programming" perspective since it uses an advanced language feature. Finally, this way to handle function arguments completely breaks down if several parameters would have to be variable. In essence, the perceived simplicity in Matlab and pyarma comes at the cost of "general purpose programming" capability and API design. "One can't have it both ways", so to speak. Solving things in Matlab the Matlab way, and solving things in Python the Python way have evolved to serve the needs of their user bases and applications. Solving things in Python the Matlab way is certainly possible, but why would one want to do that?

From the information that you provide, pyarma could be implemented as a very thin wrapper around NumPy, as @JaroslavHron already pointed out:

import numpy as np

class Matrix:
    def __init__(self, a):
        self._a = np.array(a)

    def __mul__(self, b):
        return Matrix(self._a @ np.array(b))

    def __array__(self):
        return self._a

    def print(self):
        print(self._a)

    def t(self):
        return Matrix(self._a.T)

    # Other operators etc go here...
    ...

def ones(*args):
    return Matrix(np.ones(shape=args))

def randu(*args):
    return Matrix(np.random.random_sample(size=args))

A = ones(4, 5)
B = randu(4, 5)

C = A * B.t()

C.print()

The justification that you provide in your comment to @JaroslavHron does not convince me.

Regarding "The use of Armadillo hence reduces the amount of the "translation" code (between Python and C++), which in turn considerably helps with maintainability (ie. debugging, future expansions, etc).": This already takes for granted that any extra translation between Python and C++ is necessary at all. The above example doesn't even need any translation, so by your line of argumentation it is superior. In both cases, the underlying back-ends (Armadillo resp. NumPy) require their own maintenance, so that is a draw.

Your statement "While it is certainly possible to use other backends, it would involve considerably more work." is simply not true, given the above example that trivially uses NumPy as a back-end.

Does pyarma currently have any other tangible advantages for users over such a simplistic approach? From everything I can see, it is currently just implemented in a more complicated way.

In conclusion, the provided information in the paper and in the documentation do not establish a need for this project in the current state.

mjsottile commented 3 years ago

Sorry for the delay: I've been thinking about the reviews and waiting to see if the authors have anything further to add. Regarding some of the comments above:

"Solving things in Python the Matlab way is certainly possible, but why would one want to do that?"

Speaking for myself as a long time NumPy (and previously Numeric and Numarray before NumPy existed) user, I grudgingly use NumPy but do not personally care for some of the decisions the community has converged upon with respect to API design. There are some things I like in Matlab, and would certainly experiment with a Python library that gives me Matlab-like idioms. So, to answer the "why would one want to do that" - personal preference is certainly valid.

It is useful to be reminded of this guidance from the JOSS editorial guidelines:

"Submissions that implement solutions already solved in other software packages are accepted into JOSS provided that they meet the criteria listed above and cite prior similar work". (https://joss.readthedocs.io/en/latest/review_criteria.html)

The fact that PyArmadillo overlaps significantly with other linear algebra and array libraries is not a limiting factor in accepting it into JOSS assuming that it does meet our primary criteria - open source licensing, substantial scholarly effort, documentation, functionality, and tests (all detailed again on https://joss.readthedocs.io/en/latest/review_criteria.html)

@conradsnicta : do you have any additional comments/responses to add? I'm also curious if the other reviewer @JaroslavHron has any additional thoughts based on the authors response to them a few days ago?

uellue commented 3 years ago

@mjsottile,

"Submissions that implement solutions already solved in other software packages are accepted into JOSS provided that they meet the criteria listed above and cite prior similar work". (https://joss.readthedocs.io/en/latest/review_criteria.html)

The submission here is pretty close to being a clone of NumPy/Numeric/Numarray from 15 years ago from a user perspective. If NumPy didn't exist, it would be no question that bringing Matlab-like array math to Python would be a very useful and valid project while being a solution that is already solved in other software packages, i.e. Matlab. However, NumPy exists, so this is already solved in Python. With the statement above, pyarma still has to demonstrate that it "meet[s] the criteria listed above", which includes a statement of need. A statement of need is indeed included, I currently just can't follow the line of its argumentation and it seems inconsistent. In particular, the stated goals and the path taken to reach them are at odds. Furthermore, it seems rather one-sided and oblivious to the potential drawbacks of the chosen approach that have to be balanced with their benefits. That didn't improve with the latest update, but shows the inconsistencies even more clearly.

To summarize my previous two comments:

The paper and documentation should not only highlight the advantages of their package, but also cover disadvantages, i.e. reasons NOT to use it. As I wrote before, the stated drawbacks of NumPy can actually be advantages from a different perspective. The biggest limitation of pyarma compared to my current use of Numpy seems to be a fundamental limitation to 2D and 3D arrays -- which the paper actually pitches as an advantage of pyarma. It should be obvious that this limitation makes pyarma plain unsuitable for a wide range of NumPy applications, which should be stated in the paper and documentation as a disadvantage of pyarma.
The paper should be more clear about its intended use. Is it building larger applications using the "general purpose programming" capability of Python, or is it designed for interactive use in Jupyter notebooks etc.? From my understanding, the stated advantages point towards interactive use. I do understand that an API that is great for building larger applications might be not so suitable for interactive use. However, this point is not made, instead NumPy is just portrayed as a "bad design" altogether, despite general purpose programming being named as an advantage of Python. This is self-contradictory. Since it came up, the nested structure of NumPy and SciPy is another advantage over pyarma in some settings since it allows to selectively load only the required parts, which saves memory and reduces loading times in "general purpose programming". Again, this should be discussed in a balanced fashion.
As you write, the API is also a matter of taste. The paper currently writes that a majority of Matlab users are unhappy with NumPy. Such a general statement should be supported with a source. In particular, from the given information I am not sure if this is just a transition phase while getting used to the new environment, or if it persists over longer time. I would accept without a source that some Matlab users are unhappy, since that is the case at least for the authors. From the code example I'd judge that this is just a transition issue since they are so similar. If there is more to it than just writing A.T instead of A.t(), A @ B instead of A * B and a few additional parentheses, the paper should show that much more clearly.
The stated goals of pyarma -- apparently some API "sugar" for people who are more comfortable with Matlab -- is completely at odds with the technical approach taken to solve it. As I wrote before, if the goal is to add "sugar" for some users, one can just write a thin wrapper around NumPy and SciPy. Instead, the authors chose to implement a wrapper around a different C++ library. Currently there is zero justification for that approach in the paper or in the documentation, and no comparison of Armadillo with the NumPy back-end. From the feedback from the authors I conclude that there is no significant difference for users. There is simply no discernable need for the additional technical effort, complexity and maintenance that mirrors existing efforts in NumPy, so it fails the "statement of need" criterion.

uellue commented 3 years ago

@mjsottile if I may add, pyarma is also on the lower end of the "scholarly effort" criterion. The first commit is merely 5 months old and the first public release on PyPI is from February this year. Its user base seems to be small as well, apparently mostly C++ code.

If it had a large user base I'd see that as a proof for some need for it. If it demonstrated significant advantages over existing software, by the criterion "Whether the software is sufficiently useful that it is likely to be cited by other researchers working in this domain." it would count for it as well. But since it barely meets the "scholarly effort" criterion and doesn't seem to meet the "statement of need" (edit: and the "state of the field") criterion, this simply doesn't pass the bar at this time from my understanding of the review criteria.

uellue commented 3 years ago

@mjsottile if I understand the review criteria correctly, they don't put any particular burden on the impact or usefulness of the software other than the "scholarly effort" section, right? Just to make it clear, my criticism above is about inconsistencies in the "statement of need" and a lack of balance and depth in the "state of the field", not per se about pyarma itself. If one dials back some of the claims to a level that can be sustained by facts, includes a fair and competent comparison with NumPy where pyarma doesn't need to come out ahead to meet acceptance criteria, and finds any application where pyarma makes sense, even if it is niche, that would be OK by my understanding of the criteria. From the top of my head, I could imagine that it is useful to compare the underlying Armadillo library to NumPy in terms of performance, numerical stability and correctness. @conradsnicta could it perhaps also be useful to create a Python interface to software that uses Armadillo? To the very least, it is a non-trivial example for using pybind11 and an opportunity to discuss API design for linear algebra in Python.

conradsnicta commented 3 years ago

@mjsottile

do you have any additional comments/responses to add? I'm also curious if the other reviewer @JaroslavHron has any additional thoughts based on the authors response to them a few days ago?

Before to stating our additional responses below, a summary of @uellue 's central positions can be paraphrased as follows: (i) it's not necessary to provide a Matlab-like API in Python because NumPy already exists and its API is good enough; and (ii) the Matlab-like API can be done on top of NumPy, so the proposed implementation is not necessary.

We respectfully disagree with both points (i) and (ii).

In response to point (i), it is our position is that providing a Matlab-like API for Python is indeed useful, as we have personally experienced many annoyances and frustrations with using NumPy/SciPy. Furthermore, after discussions with many colleagues and students who have experienced similar frustrations with NumPy/SciPy (especially those used to Matlab), we have come to the conclusion that providing a Matlab-like API has the potential to benefit many people.

While the above observations are anecdotal evidence for the need for a Matlab-like API in Python, we'd prefer to spend our time creating open-source software and putting it out there for other people to use, rather than setting up a survey and gathering responses over months/years before doing anything. The "scratch personal itch" phenomenon is one of the main drivers in the creation of open source software (eg. the Linux kernel, Apache web server, git revision control, Python itself, etc; each of these projects was "re-implementing" something already done before, but with a new angle/approach/benefits).

We note that PyArmadillo is being used internally at our workplace, and as such it is proving useful. We hope it can be proved useful in other workplaces.

In response to point (ii), it is our position that this is an implementation detail. Any choice here is likely to be valid, as long as it is effective in accomplishing the main goal. While it may be entirely valid to implement a Matlab-like API by writing code on top of NumPy (as suggested by the reviewer), it is also perfectly valid to use pybind11 and interface with C++ code, as is done in PyArmadillo. We have chosen the latter route based on personal preferences, and to reduce writing the interface directly in Python wherever possible for perceived performance reasons. Furthermore, we believe that using Armadillo as the C++ backend has considerably reduced the amount of required "translation" code. This has benefits in terms of reduced maintenance effort and technical debt.

PyArmadillo aims to provide a choice for users, beyond the default NumPy/SciPy. It also allows people who are used to Matlab an easier transition to the Python ecosystem. As such, we believe it is a useful piece of open source software.

uellue commented 3 years ago

@conradsnicta thank you for the clarifying comments. That confirms my concerns with pyarma that I voiced in the review.

@mjsottile the authors confrm that there are only "perceived performance reasons" for the chosen approach. After a quick test, the example based on NumPy is actually faster for larger matrices, and plain NumPy is fastest for both large and small:

Given this result, I am unchecking the "significant scholarly effort" criterion since pyarma only passes the three person-months criterion because it is implemented in an unnecessarily complex way that only has disadvantages from the information the authors provide and from my own tests.

uellue commented 3 years ago

Since there has been so much discussion, I'd like to point out the most important concern in my first review comment that the authors didn't address at all so far: pyarma is, in fact, much closer to NumPy than to Matlab given their comparison table https://pyarma.sourceforge.io/docs.html#syntax. That puts any claims of an easier transition for Matlab users compared to NumPy in question and would require more than anecdotal evidence to support. The stark contrast between the claims made by the authors and any factual, objective information points towards a personal bias being at play. The whole submission is consistent with the well-known not invented here bias.

mjsottile commented 3 years ago

@jason-rumengan I had a short discussion with @arfon (one of the JOSS editors-in-chief), and we believe the package is acceptable for JOSS with respect to effort and scale. The writeup just needs a little work related to motivating the need and design.

In your statement of need you might point out specific instances of users that are using the tool to justify that there was clearly a need since you wrote it and someone is using it. I'd rather see that emphasized than hard-to-defend arguments about APIs being better or easier, since it's very hard to objectively measure "ease". I think it's just fine to say that a large part of the design rationale was to give Python users an API that is similar to what they would get using Armadillo in C++ or via the R binding to the C++. That on its own would be a reasonable justification, given that Armadillo was published (in JOSS) in 2016 and has a pretty good citation count - so it's clearly used, and a Python binding would increase its reach to more people.

I think some of the existing discussion around Matlab, numpy, and syntax needs some work. I think a lot of what you're trying to say is that you're aiming for 1) an API that matches the existing Armadillo API well, and 2) retains some of the ergonomics that some users will find familiar from Matlab. Some specific comments I had on reading it:

the licensing issue is known and I don't think qualifies as justifying the need for the package. Entire ecosystems exist (Python, Julia, R, etc...) that provide open source alternatives to commercial tools, so there's nothing really noteworthy about that aspect of this package. I'd drop that part.
The discussion of matrices vs multidimensional arrays reads awkwardly as stated. Matlab and NumPy both are based on multidimensional arrays (they both use an array model that looks very similar to F90 and other older array languages). I believe you really are getting at the specific use case of matrices and vectors and the ergonomics of the syntax when people just want to work with arrays of those specific shapes. You sort of allude to that, but I'd rather see it more explicit as the motivation - as I understand it, your goal is to make doing linear algebra more concise at the syntax level, which Matlab achieves reasonably well syntactically.

Two big issues are present that would not go over well with people familiar with the design of programming languages:

The issue around namespace nesting is a pretty weak motivation. You can alias module namespaces in Python just fine ("import this.that.another.thing as shortname"). Many languages with nested namespaces provide the same kind of aliasing facility. Furthermore, Matlab even provides package folders that let you have nested namespaces in Matlab too since working in a flat global namespace is not a good idea for complex programs.
The "double braces" issue can be explained better. I think the gist isn't that there are two parentheses, but that a user who isn't very knowledgeable about Python may not realize that those "double braces" argument functions are taking a single tuple argument, versus variadic arguments or using a fixed number of scalar arguments. Again, I think I get that you're trying to speak to the ergonomics of working just with matrices and vectors, so eliminating the tuple and generality of n-dimensional matrices makes the API a bit simpler by not passing shapes around as objects.

Based on my chat with @arfon, I think if you do a pass over the paper to address those issues above, it should be good.

uellue commented 3 years ago

@mjsottile @arfon thank you for the detailed considerations. This sounds like a good approach, I fully agree with your assessment and proposed improvements.

JaroslavHron commented 3 years ago

I understand PyArma to be useful as python interface for Armadillo library - similar to other library interfaces "4py" (petsc4py, ga4py, etc.) and the performance will be mostly up to the C++ Armadillo library.

I fully agree with assessment by @mjsottile

arfon commented 2 years ago

@jason-rumengan – just checking in to see when you think you might be able to make the changes proposed by @mjsottile?

conradsnicta commented 2 years ago

@arfon We're working on the revision, aiming to address the changes proposed by @mjsottile. I'm currently stuck in quarantine with kids, due to a Covid outbreak at their school. In addition to my usual Day Job responsibilities (while working from home), this makes progress on the article annoyingly slow.

arfon commented 2 years ago

@arfon We're working on the revision, aiming to address the changes proposed by @mjsottile. I'm currently stuck in quarantine with kids, due to a Covid outbreak at their school. In addition to my usual Day Job responsibilities (while working from home), this makes progress on the article annoyingly slow.

:+1: got it. No problem at all. Wishing you and the family well and please prioritize life over JOSS :-)

jason-rumengan commented 2 years ago

@mjsottile we have revised the paper, updated the repo, and have uploaded the software to Zenodo. The DOI for the archived software is https://doi.org/10.5281/zenodo.5564389

As the archived software version is v0.500.2, could the version in this review thread be updated? Thank you.

uellue commented 2 years ago

@jason-rumengan Thank you for the revised version! The thoroughly revised statement of need sounds convincing to me, so I can check this item for the software paper. That also settles the "scholarly effort", since it provides convincing arguments for creating a Python binding for Armadillo rather than just wrapping NumPy. Remaining are two items:

[ ] With the revised positioning of PyArmadillo, the "state of the field" should at least include ALGLIB and Eigen, which are both linear algebra libraries for C++ that include Python bindings: https://www.alglib.net/translator/man/manual.cpython.html, https://pypi.org/project/minieigen/ and https://pythonhosted.org/sppy/.
[ ] The updated "Statement of Need" should also be reflected in the PyArmadillo documentation.

mjsottile commented 2 years ago

@whedon generate pdf

whedon commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

mjsottile commented 2 years ago

Hi @uellue - I had already communicated with the authors that their revisions were sufficient a few days ago (and then I got distracted with work for the last 3 days). Given how long this review has taken, I believe the writeup and revisions that they have made meet what I had asked for previously. I'm going to consider the review complete.

mjsottile commented 2 years ago

@whedon check references

whedon commented 2 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1137/1.9780898719604 is OK
- 10.1016/j.csda.2013.02.005 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.1109/MCSE.2007.55 is OK
- 10.1007/978-3-319-32452-4 is OK
- 10.21105/joss.00026 is OK
- 10.1007/978-3-319-96418-8_50 is OK
- 10.1038/s41592-019-0686-2 is OK

MISSING DOIs

- None

INVALID DOIs

- None

mjsottile commented 2 years ago

@whedon set v0.500.2 as version

whedon commented 2 years ago

OK. v0.500.2 is the version.

mjsottile commented 2 years ago

@whedon set 10.5281/zenodo.5564389 as archive

whedon commented 2 years ago

OK. 10.5281/zenodo.5564389 is the archive.

mjsottile commented 2 years ago

@whedon recommend-accept

whedon commented 2 years ago

Attempting dry run of processing paper acceptance...

whedon commented 2 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1137/1.9780898719604 is OK
- 10.1016/j.csda.2013.02.005 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.1109/MCSE.2007.55 is OK
- 10.1007/978-3-319-32452-4 is OK
- 10.21105/joss.00026 is OK
- 10.1007/978-3-319-96418-8_50 is OK
- 10.1038/s41592-019-0686-2 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon commented 2 years ago

:wave: @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/2675

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/2675, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

arfon commented 2 years ago

@whedon accept deposit=true

openjournals / joss-reviews