ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

`fastMatMR`: Fast Matrix Market I/O #606

Closed HaoZeke closed 8 months ago

HaoZeke commented 10 months ago

Date accepted: 2023-10-30 Submitting Author Name: Rohit Goswami Submitting Author Github Handle: !--author1-->@HaoZeke<!--end-author1-- Repository: https://github.com/HaoZeke/fastMatMR Version submitted: Submission type: Standard Editor: !--editor-->@maelle<!--end-editor-- Reviewers: @osorensen, @czeildi

Archive: TBD Version accepted: TBD Language: en


Package: fastMatMR
Title: "fastMatMR: High-Performance Matrix Market File Operations in R"
Version: 1.0.0.0
Authors@R:
    person("Rohit", "Goswami", , "rgoswami@ieee.org", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0002-2393-8056"))
Description: "fastMatMR is an R package offering high-performance read and write operations for Matrix Market files. It acts as a thin wrapper around the 'fast_matrix_market' C++ library, offering speed and extended support for Matrix Market formats. Unlike other R packages, fastMatMR supports not just sparse matrices but also dense vectors and matrices. This makes it a versatile choice for dealing with .mtx files in R."
License: MIT + file LICENSE
SystemRequirements: C++17
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
LinkingTo:
    cpp11
Suggests:
    ggplot2,
    knitr,
    Matrix,
    microbenchmark,
    rmarkdown,
    testthat (>= 3.0.0)
URL: https://github.com/HaoZeke/fastMatMR
BugReports: https://github.com/HaoZeke/fastMatMR/issues
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

The matrix market exchange formats are crucial to much of the ecosystem. The fastMatMR package focuses on high-performance read and write operations for Matrix Market files, serving as a key tool for data extraction in computational and data science pipelines.

Data scientists who might want to load and test the NIST matrices which include:

comparative studies of algorithms for numerical linear algebra, featuring nearly 500 sparse matrices from a variety of applications, as well as matrix generation tools and services.

Additionally, this makes its simpler to interface to scipy and the rest of the data science ecosystem. This also includes working with the Tensor Algebra Compiler (TACO).

The Matrix package in R can perform similar operations but only for sparse matrices. The fastMatMR package not only provides enhanced performance but also extends support to dense matrices and vectors in base R, thus offering a more versatile solution.

We have both read and write performance vignettes backing up the claims made.

The package passes pkcheck: https://github.com/HaoZeke/fastMatMR/issues/18, though the review bot disagrees :)

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ropensci-review-bot commented 10 months ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 10 months ago

:rocket:

Editor check started

:wave:

HaoZeke commented 10 months ago

Also, not part of the issue template, but the package passes pkcheck: https://github.com/HaoZeke/fastMatMR/issues/18

ropensci-review-bot commented 10 months ago

Checks for fastMatMR (v1.0.0.0)

git hash: a910f6be

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:--------------|------:| |internal |fastMatMR | 6| |imports |NA | NA| |suggests |ggplot2 | NA| |suggests |knitr | NA| |suggests |Matrix | NA| |suggests |microbenchmark | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |linking_to |cpp11 | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

fastMatMR

fmm_to_mat (1), fmm_to_vec (1), mat_to_fmm (1), release_questions (1), sparse_to_fmm (1), vec_to_fmm (1)

**NOTE:** No imported packages appear to have associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in C++ (7% in 3 files), C/C++ Header (91% in 15 files) and R (2% in 3 files) - 1 authors - 3 vignettes - no internal data file - 1 imported package - 7 exported functions (median 3 lines of code) - 8 non-exported functions in R (median 3 lines of code) - 217 C/C++ functions (median 5 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 3| 21.5| | |files_src | 3| 84.3| | |files_inst | 15| 99.6| | |files_vignettes | 3| 92.4| | |files_tests | 3| 75.2| | |loc_R | 45| 4.3|TRUE | |loc_src | 202| 27.6| | |loc_inst | 2545| 81.7| | |loc_vignettes | 276| 60.9| | |loc_tests | 68| 31.5| | |num_vignettes | 3| 94.2| | |n_fns_r | 15| 21.2| | |n_fns_r_exported | 7| 34.0| | |n_fns_r_not_exported | 8| 18.0| | |n_fns_src | 217| 89.5| | |n_fns_per_file_r | 3| 46.2| | |n_fns_per_file_src | 8| 68.9| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 3| 1.1|TRUE | |loc_per_fn_r_exp | 3| 1.5|TRUE | |loc_per_fn_r_not_exp | 3| 1.5|TRUE | |loc_per_fn_src | 5| 5.0|TRUE | |rel_whitespace_R | 29| 11.6| | |rel_whitespace_src | 20| 31.5| | |rel_whitespace_inst | 22| 82.8| | |rel_whitespace_vignettes | 41| 68.6| | |rel_whitespace_tests | 24| 30.6| | |doclines_per_fn_exp | 18| 10.4| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 480| 95.1|TRUE | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/HaoZeke/fastMatMR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/HaoZeke/fastMatMR/actions) [![pkgcheck](https://github.com/HaoZeke/fastMatMR/workflows/pkgcheck/badge.svg)](https://github.com/HaoZeke/fastMatMR/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 6056018810|pages build and deployment |success |c3523b | 8|2023-09-02 | | 6056006940|pkgcheck |success |fe999d | 21|2023-09-02 | | 6056006937|pkgdown |success |fe999d | 23|2023-09-02 | | 6056006936|pre-commit |success |fe999d | 49|2023-09-02 | | 6056006938|R-CMD-check |success |fe999d | 39|2023-09-02 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following note: 1. checking installed package size ... NOTE installed size is 11.1Mb sub-directories of 1Mb or more: libs 10.2Mb R CMD check generated the following check_fail: 1. rcmdcheck_reasonable_installed_size #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 68.1 The following files are not completely covered by tests: file | coverage --- | --- inst/include/fast_matrix_market/chunking.hpp | 63.83% inst/include/fast_matrix_market/fast_matrix_market.hpp | 18.92% inst/include/fast_matrix_market/header.hpp | 59.69% inst/include/fast_matrix_market/read_body_threads.hpp | 67.44% inst/include/fast_matrix_market/read_body.hpp | 42.29% inst/include/fast_matrix_market/write_body.hpp | 50% R/misc.R | 0% #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 1 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 1


Package Versions

|package |version | |:--------|:-------| |pkgstats |0.1.3.8 | |pkgcheck |0.1.2.2 |


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

HaoZeke commented 10 months ago

FWIW there's .covignore which is not being picked up by the bot here (works on the repo, https://github.com/HaoZeke/fastMatMR/issues/18). Package coverage is:

> covr::package_coverage()
fastMatMR Coverage: 94.69%
src/from_file.cpp: 93.88%
src/to_file.cpp: 94.23%
R/fastMatMR-package.R: 100.00%
mpadge commented 9 months ago

@HaoZeke is indeed right there - the failing coverage check is a bug on our side, and not this submission.

noamross commented 9 months ago

@ropensci-review-bot assign @maelle as editor

ropensci-review-bot commented 9 months ago

Assigned! @maelle is now the editor

maelle commented 9 months ago

:wave: @HaoZeke! Nice to see you as an author this time around, thank you for your submission! :grinning:

A few comments/questions before we proceed to the reviewer search:

Thank you!

HaoZeke commented 9 months ago

👋 @HaoZeke! Nice to see you as an author this time around, thank you for your submission! 😀

^_^

A few comments/questions before we proceed to the reviewer search:

* Why add the bundled files to .covrignore, if they contribute to the functionality of the package?

This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.

The user-facing C++ functions are in src/ and covered by tests.

* I see you use conventional commits, nice. How is the changelog generated? I see [DOC: Fix news HaoZeke/fastMatMR#17](https://github.com/HaoZeke/fastMatMR/issues/17) Disclaimer: I contribute to https://github.com/cynkra/fledge/

That's a very cool library, I am currently using towncrier and tbump which are fairly generic tools.

* In the example where you use `->` I'd suggest using `<-` instead since the former is more rare.

* In the README you write "writing and reading from other R objects". Could you please elaborate, adding a few sentences about this to the README?

* To me the second part of the README, "Development", belongs to the contributing guide, so you could move the content and instead point contributors to the contributing guide.

* In the contributing guide could you please add a link to pixi? I for instance do not know what it is.

Yup these are great suggestions, updated the package with them.

* In the issue tracker there are a few enhancement issues. What's the timeline for them, are they "nice to have" or do you intend to tackle them soon? If so should the review wait?

These are more version 2 issues and shouldn't block the review I think. It would take a while for me to get to them, the current functionality covers R matrices, vectors and sparse matrices from Matrix which cover most of the use-cases. Other sparse matrix formats are on the issue list but they're very optional "nice to have" issues since users can always convert to Matrix formats / dense types before using the library.

Thank you!

You're welcome :)

maelle commented 9 months ago

This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.

To clarify, the question was about tests specifically. Part of the code in inst/ is covered by tests anyway, so why not include them (and remove the unused code)?

We do not have strict requirements on testing coverage for bundled code yet, but we can improve our guidance based on cases like this submission, so this discussion is important.

HaoZeke commented 9 months ago

This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.

To clarify, the question was about tests specifically. Part of the code in inst/ is covered by tests anyway, so why not include them (and remove the unused code)?

Mostly maintainence. I have actually stripped the (already small) library of header files which are not used at all (Eigen / Armadillo / other C++ library compatibility files).

In general, for bundled libraries it is a good idea to not tamper with the upstream source as much as possible (it makes it easier to update to new versions).

We do not have strict requirements on testing coverage for bundled code yet, but we can improve our guidance based on cases like this submission, so this discussion is important.

Yup, I understand. Policy wise I'd be even stricter, and say that bundled libraries must not be touched at all (other than whole-file removals if they are not relevant and self contained).

This is standard practice in bundling libraries in C++ too, where projects might have different linting rules and external files are ignored instead of refactored in-place within the project. The idea is that this way upstream developers can weigh in more easily on the code.

It also allows people to write bindings as long as they understand the public API of the code. Often times users who want bindings are not fully aware / have the expertise to make decisions about code removal from (possibly) complicated upstream C++ code.

maelle commented 9 months ago

Thank you for your thorough answer! I'll now look for reviewers.

maelle commented 9 months ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 9 months ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/606_status.svg)](https://github.com/ropensci/software-review/issues/606)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

maelle commented 9 months ago

@ropensci-review-bot add @osorensen to reviewers

ropensci-review-bot commented 9 months ago

@osorensen added to the reviewers list. Review due date is 2023-10-06. Thanks @osorensen for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 9 months ago

@osorensen: If you haven't done so, please fill this form for us to update our reviewers records.

maelle commented 9 months ago

@ropensci-review-bot add @czeildi to reviewers

ropensci-review-bot commented 9 months ago

@czeildi added to the reviewers list. Review due date is 2023-10-07. Thanks @czeildi for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 9 months ago

@czeildi: If you haven't done so, please fill this form for us to update our reviewers records.

osorensen commented 9 months ago

@HaoZeke and @maelle, here is my initial review. My comments at the bottom explain a bit more about the boxes I haven't checked.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 2 hours

Review comments

Dense matrices with the matrix package

In README, you state that the Matrix package only handles sparse objects. This is not correct.

library(Matrix)

The command below lists all dense matrix types supported by the Matrix package.

showClass("denseMatrix")

## Virtual Class "denseMatrix" [package "Matrix"]
## 
## Slots:
##                         
## Name:       Dim Dimnames
## Class:  integer     list
## 
## Extends: 
## Class "Matrix", directly
## Class "mMatrix", by class "Matrix", distance 2
## Class "replValueSp", by class "Matrix", distance 2
## 
## Known Subclasses: 
## Class "unpackedMatrix", directly
## Class "packedMatrix", directly
## Class "ndenseMatrix", directly
## Class "ldenseMatrix", directly
## Class "ddenseMatrix", directly
## Class "ngeMatrix", by class "unpackedMatrix", distance 2
## Class "ntrMatrix", by class "unpackedMatrix", distance 2
## Class "nsyMatrix", by class "unpackedMatrix", distance 2
## Class "ntpMatrix", by class "packedMatrix", distance 2
## Class "nspMatrix", by class "packedMatrix", distance 2
## Class "lgeMatrix", by class "unpackedMatrix", distance 2
## Class "ltrMatrix", by class "unpackedMatrix", distance 2
## Class "lsyMatrix", by class "unpackedMatrix", distance 2
## Class "ltpMatrix", by class "packedMatrix", distance 2
## Class "lspMatrix", by class "packedMatrix", distance 2
## Class "dgeMatrix", by class "unpackedMatrix", distance 2
## Class "dtrMatrix", by class "unpackedMatrix", distance 2
## Class "dsyMatrix", by class "unpackedMatrix", distance 2
## Class "dpoMatrix", by class "dsyMatrix", distance 3
## Class "corMatrix", by class "dpoMatrix", distance 4
## Class "dtpMatrix", by class "packedMatrix", distance 2
## Class "dspMatrix", by class "packedMatrix", distance 2
## Class "dppMatrix", by class "dspMatrix", distance 3
## Class "pcorMatrix", by class "dppMatrix", distance 4

Below is an example of a dense matrix from the Matrix package:

(dense_matrix <- as(matrix(1:10, ncol = 2), "dMatrix"))

## 5 x 2 Matrix of class "dgeMatrix"
##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10

str(dense_matrix)

## Formal class 'dgeMatrix' [package "Matrix"] with 4 slots
##   ..@ Dim     : int [1:2] 5 2
##   ..@ Dimnames:List of 2
##   .. ..$ : NULL
##   .. ..$ : NULL
##   ..@ x       : num [1:10] 1 2 3 4 5 6 7 8 9 10
##   ..@ factors : list()

Statement of need

The README file contains a section titled “Why?”, but it does not state explicitly what problems the software is designed to solved nor what the target audience is.

Vignettes

There are vignettes, but they are very brief. Please provide some more extended examples, with more text explaining what is being done.

Examples

I understand that the fmm_to_* functions need an mtx file to work, and I guess it’s therefore the examples with these functions are marked with \dontrun{}. However, wouldn’t it possible to define a character vector in R that contains the required input, and then feed this to fmm_to_* in order to have an example that can actually be run?

HaoZeke commented 9 months ago

Just a quick clarification, @osorensen, the readme wording is a bit unclear but it states in the previous section that the goal is the reading and writing of Matrix Market files. For that, it is true that Matrix does not support dense matrices:

> library("Matrix")
> (dense_matrix <- as(matrix(1:10, ncol = 2), "dMatrix"))
> writeMM(dense_matrix, "mat.mtx")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘writeMM’ for signature ‘"dgeMatrix"’
> ?writeMM
...
     Read matrices stored in the Harwell-Boeing or MatrixMarket formats
     or write ‘sparseMatrix’ objects to one of these formats.
...

The longer explanation is part of the ROpenSci issue (intended audience, use case) but I will add it to the readme soon.

P.S. Thanks for accepting the review and getting started with this :)

maelle commented 9 months ago

Thank you @osorensen!!

maelle commented 9 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/606#issuecomment-1731654678 time 2

ropensci-review-bot commented 9 months ago

Logged review for osorensen (hours: 2)

ropensci-review-bot commented 8 months ago

:calendar: @czeildi you have 2 days left before the due date for your review (2023-10-07).

czeildi commented 8 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 5


Review Comments

All in all, I could easily install the package, understand its functionality and use it in some of my (simple) own examples without having previous knowledge of matrix market. I could navigate the code and mostly understand it (I am not experienced with cpp11 code).

I have a couple of comments, detailed below. Most of them are non-blocking and might be somewhat opinionated so feel free to apply them as you see fit (besides the ones where I did not tick the corresponding box)

Functionality

Documentation

Code

maelle commented 8 months ago

Thanks a lot @czeildi!!

maelle commented 8 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/606#issuecomment-1751660212 time 5

ropensci-review-bot commented 8 months ago

Logged review for czeildi (hours: 5)

maelle commented 8 months ago

@HaoZeke Now both reviews are in, as a reminder see https://devdevguide.netlify.app/softwarereview_author#the-review-process

ropensci-review-bot commented 8 months ago

@HaoZeke: please post your response with @ropensci-review-bot submit response <url to issue comment> if you haven't done so already (this is an automatic reminder).

Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html

HaoZeke commented 8 months ago

Sorry for the delay. Thank you reviewers and editors. I've made changes, in accordance to the suggestions and comments, and made a new release.

Response to @osorensen

Response in https://github.com/ropensci/software-review/issues/606#issuecomment-1731692299.

I have rewritten the readme to explain the context.

There are vignettes, but they are very brief. Please provide some more extended examples, with more text explaining what is being done.

The functionality of the package is to read and write a particular file format, commonly used in data-science across many languages including R. Apart from the dense matrix and vector support, the vignettes show performance improvements over the implementation in Matrix.

The write_fmm functions can be run locally, for saving R objects as shown, and then those can be used subsequently for loading. The speed increases are largely by using C++ to load data in an optimal manner so this isn't feasible. https://github.com/HaoZeke/fastMatMR/issues/42

Response to @czeildi

All in all, I could easily install the package, understand its functionality and use it in some of my (simple) own examples without having previous knowledge of matrix market. I could navigate the code and mostly understand it (I am not experienced with cpp11 code).

Glad to hear it :)

I have a couple of comments, detailed below. Most of them are non-blocking and might be somewhat opinionated so feel free to apply them as you see fit (besides the ones where I did not tick the corresponding box)

These were all excellent, and I addressed each of them in the linked issues, I will briefly link to them again.

Performance: I did not re-run the benchmark vignettes myself but tried the package out with some moderately large matrices and it felt reasonably fast. 👍🏼

Glad to hear that. The benchmark vignettes are very large and not really meant to be run often (they can take upto 30-40 minutes).

it seems the lib cannot handle paths with ~ (?) If this cannot be made to work easily, some error message would be very helpful, I think it would be needed to be added here: https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L26 The corresponding read file returns an error message: 'cannot open file' which is more helpful. (I am guessing the issue is related to ~ in path because write_fmm(vec, "~/vector.mtx") returned false for me while write_fmm(vec, "/home/ildi_home/vector.mtx") worked)

This was a pretty involved feature, but I managed to get it in https://github.com/HaoZeke/fastMatMR/issues/24.

I really appreciated the precise error message in

vec_to_fmm(1L, "vector.mtx")
#> Error: Invalid input type, expected 'double' actual 'integer'

However, I was wondering if it could actually accept integers as well?

Implemented in https://github.com/HaoZeke/fastMatMR/issues/25.

improve error message? It is a matrix, just sparse, not plain

mat_to_fmm(sparse_mat, "sparse_matrix.mtx")
#> Error in mat_to_fmm(sparse_mat, "sparse_matrix.mtx") : object is not a matrix

As noted in https://github.com/HaoZeke/fastMatMR/issues/26, users should be calling write_fmm instead of the lower level functions.

improve error message?

mat <- matrix(c(1, 2, 3, 4), nrow = 2)
sparse_to_fmm(mat, "matrix.mtx")
#> Error: Invalid input type, expected 'double' actual 'NULL'

Same as above (https://github.com/HaoZeke/fastMatMR/issues/27), write_fmm has the right user-facing semantics.

future suggestion: support reading .mtx.gz files as I can download from for example https://math.nist.gov/MatrixMarket/data/NEP/quebec/qh1484.html ?

This one is tracked, but not for this release. It would need quite a bit more design and I'm not sure it will be in scope. Tracked here: https://github.com/HaoZeke/fastMatMR/issues/28

consider increased consistency of function names? e.g fmm_to_sparseMatrix vs sparse_to_fmm. sparse_to_fmm was a bit confusing to me in that if I understand correctly, the package is 'fast', not the target format itself (?). matrix and vector is abbreviated to spare a few characters, while sparse_matrix is not. A vague suggestion: possibly follow the pattern of fmm_write_vector, fmm_read_to_vector for all formats?

As noted in https://github.com/HaoZeke/fastMatMR/issues/29, sparse_to_fmm has been converted to sparse_Matrix_to_fmm for consistency.

Documentation

I did not tick function documentation because although they exist, argument names (fname vs filename) are not always consistent between the documentation and the code itself

I have fixed this in https://github.com/HaoZeke/fastMatMR/issues/30.

I did not tick Examples, because although the examples run, but only in the right order, i.e. if I first run the examples of the write functions then the examples of the read functions run correctly but they are not self contained

Completed in https://github.com/HaoZeke/fastMatMR/issues/42 and https://github.com/HaoZeke/fastMatMR/issues/31. Essentially, while the write_fmm can be self-contained, fmm_to_* need to have matrix market files present.

readme: "Unlike the Matrix package" : maybe rephrase? it is a bit off-putting for me to start the whole readme with a negative comparison, although I understand that a core part of the motivation for this package is that Matrix does not support everything and not fast enough. Even changing the order of the two parts of the sentence would feel better for me but this is really personal In the readme: "we support / the package supports". Consider unifying these two wording styles

Completed in https://github.com/HaoZeke/fastMatMR/issues/32 and https://github.com/HaoZeke/fastMatMR/issues/33. Along with the suggestions from the other reviewer, the readme has been more or less completely re-written, and I hope it is clearer now.

rename basic_usage to getting started? The vignette title is getting started but you can see basic_usage in the url

Completed in https://github.com/HaoZeke/fastMatMR/issues/34.

clarify what the addendum is about in the getting started readme? If I understand correctly, it demonstrates the incorrect handling of NA by Matrix package?

Completed in https://github.com/HaoZeke/fastMatMR/issues/35.

Maybe add a bit more context on why would I want to read back in Python? Possibly rather focus on being a language agnostic format?

Completed in https://github.com/HaoZeke/fastMatMR/issues/36.

consider moving the vignette summary to the beginning/remove it? It felt to me like repeating basically the same information as the beginning of the vignette

Completed in https://github.com/HaoZeke/fastMatMR/issues/37.

Did you try out logarithmic scale for the benchmark plots? Would they possibly be more readable?

Yes. All scales were considered, the plots use a mix of log-log and semi-log to show the trends, and all of them are interpreted in the text as well. Completed in https://github.com/HaoZeke/fastMatMR/issues/38.

In several function documentation you write "This function has no return value." This is technically not true, all functions return NULL if they have no other return value but rather these functions are not called for their return value, but for their side effect? Maybe see how other packages handle write functions?

Completed in https://github.com/HaoZeke/fastMatMR/issues/39.

write_fmm example: sparse_mat variable is overwritten before it is written to disk so it is not easy to actually run both of the two variants included

Completed in https://github.com/HaoZeke/fastMatMR/issues/40.

readme sentence: "Similarly, we support writing and reading from other R objects (e.g. standard R vectors and matrices), as seen in the basic use vignette." Consider rephrasing a bit, e.g. refer to 'getting started' vignette instead of basic use, possibly as described in the vignette / as you can see in the vignette instead of as seen?

Rewritten slightly for the new readme.

I am not sure how, but I think there is a relatively easy way to have the getting started vignette show up as a separate menu item in the packagedown website and not just among articles, that could improve its discoverability

Completed in https://github.com/HaoZeke/fastMatMR/issues/41.

Code

I see that R CMD check seem to give a note about pixi files, it seems pixi.* might not be the correct syntax in .Rbuildignore? (I do not know what the correct syntax would be, but the check gives a note both in github actions and locally it seems. If all else fails, adding the files one by one instead of a wildcard syntax definitely works) package version in DESCRIPTION does not match package version in news, is this intended?

Completed in https://github.com/HaoZeke/fastMatMR/issues/43.

Ideally, tests would not write to the project root in my opinion even if those files are added to .gitignore, but rather write to a temp directory. https://testthat.r-lib.org/articles/test-fixtures.html might help find a more robust solution. the readability of read unit tests could be improved if all tests are self contained and independent of each other: e.g. you can run any test_that block in itself and it is a working test (no setup code before test_that blocks ideally)

Completed in https://github.com/HaoZeke/fastMatMR/issues/44.

I think there is no need to add library(testthat) to test files

Completed in https://github.com/HaoZeke/fastMatMR/issues/45.

consistency of indentation styles could be a bit improved, consider running styler::style_pkg, see for example https://github.com/HaoZeke/fastMatMR/blob/main/tests/testthat/test-write_fmm.R#L34

These are run already, but the tests were refactored so I think it should be OK now. Completed in https://github.com/HaoZeke/fastMatMR/issues/45.

this is really overly pedantic, but you could consider specifying which rule to ignore in the #nolint comments instead of ignoring every rule (it is considered generally a best practice, but does not have real impact here). This is not needed if you would need to ignore several rules

Since these are C++ functions, the linting is basically for everything. Completed in https://github.com/HaoZeke/fastMatMR/issues/51.

pre commit github workflow name is a bit misleading to me, I understand you could add the same functionality as a pre commit hook, but a github action is a post commit/push thing, maybe title it after what it does, e.g. style checking?

Completed in https://github.com/HaoZeke/fastMatMR/issues/47.

where is this included from? I only see cpp11.cpp https://github.com/HaoZeke/fastMatMR/blob/main/src/from_file.cpp#L10 My IDE underlined this with red, but it might be fine, I am not familiar with cpp bindings in R packages

This is loaded in as part of the cpp11 package. It is found by R by using the following directive in the DESCRIPTION (as suggested by the documentation):

LinkingTo:
    cpp11

There really isn't a good way to get an IDE to be aware of the R installed package headers. cpp11.cpp is actually auto-generated :)

Completed in https://github.com/HaoZeke/fastMatMR/issues/46.

commented out code should be removed (https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L20)

Done in https://github.com/HaoZeke/fastMatMR/issues/48.

what happens if cpp lib call fails for some reason? Will the file handle be closed properly? See https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L30

If os.open() fails the function will return FALSE, and nothing else needs to be done, completed in https://github.com/HaoZeke/fastMatMR/issues/49.

possibly move rebuild-benchmarks to a tools sub directory or similar? https://r-pkgs.org/misc.html#sec-misc-tools.

Completed in https://github.com/HaoZeke/fastMatMR/issues/50.

============================

Thank you all for the great comments!

HaoZeke commented 8 months ago

@ropensci-review-bot submit response https://github.com/ropensci/software-review/issues/606#issuecomment-1776399166

ropensci-review-bot commented 8 months ago

Logged author response!

HaoZeke commented 8 months ago

@maelle sorry for the delay, I have responded and logged it via the bot :)

czeildi commented 8 months ago

@HaoZeke great work and thank you for your detailed response and making your changes easy to track. I am now happy to recommend approving this package :)

I still have two small comments (neither blocking approval):

As noted in https://github.com/HaoZeke/fastMatMR/issues/26, users should be calling write_fmm instead of the lower level functions.

If the users are not supposed to be calling the lower level functions, do they need to be exported, and thus being part of the public user interface of the package? It could be less confusing possibly if only those functions are exported which are meant to be called directly by users.

HaoZeke commented 8 months ago

@HaoZeke great work and thank you for your detailed response and making your changes easy to track. I am now happy to recommend approving this package :)

Wonderful, thanks so much ^_^

I still have two small comments (neither blocking approval):

Merging soon :)

As noted in HaoZeke/fastMatMR#26, users should be calling write_fmm instead of the lower level functions.

If the users are not supposed to be calling the lower level functions, do they need to be exported, and thus being part of the public user interface of the package? It could be less confusing possibly if only those functions are exported which are meant to be called directly by users.

I thought about this, but decided not to mainly because downstream package users (including me for other dependent packages) would want to call these functions where it makes sense (e.g. if my downstream package will only write matrix data or sparse matrix data) and CRAN disallows / warns calling unexported functions via :::

czeildi commented 8 months ago

I thought about this, but decided not to mainly because downstream package users (including me for other dependent packages) would want to call these functions where it makes sense (e.g. if my downstream package will only write matrix data or sparse matrix data) and CRAN disallows / warns calling unexported functions via :::

Makes sense, thank you for the response!

osorensen commented 8 months ago

Thanks @HaoZeke, I have completed my checklist.

HaoZeke commented 8 months ago

@maelle since the reviews are satisfactorily completed (thanks all) I was wondering whta the next steps are :)

maelle commented 8 months ago

:wave: @HaoZeke! Thanks for your work!

Thanks @czeildi @osorensen! Could please use the reviewer approval template for approval? Thanks!

osorensen commented 8 months ago

Reviewer Response

Final approval (post-review)

Estimated hours spent reviewing: 3

czeildi commented 8 months ago

Reviewer Response

Final approval (post-review)

Estimated hours spent reviewing: 6

maelle commented 8 months ago

@ropensci-review-bot approve fastMatMR

ropensci-review-bot commented 8 months ago

Approved! Thanks @HaoZeke for submitting and @osorensen, @czeildi for your reviews! :grin:

To-dos:

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

maelle commented 8 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/606#issuecomment-1731654678 time 3

ropensci-review-bot commented 8 months ago

Logged review for osorensen (hours: 3)

maelle commented 8 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/606#issuecomment-1751660212 time 6