Closed HaoZeke closed 8 months ago
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
:rocket:
Editor check started
:wave:
Also, not part of the issue template, but the package passes pkcheck
: https://github.com/HaoZeke/fastMatMR/issues/18
git hash: a910f6be
Important: All failing checks above must be addressed prior to proceeding
Package License: MIT + file LICENSE
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:--------------|------:|
|internal |fastMatMR | 6|
|imports |NA | NA|
|suggests |ggplot2 | NA|
|suggests |knitr | NA|
|suggests |Matrix | NA|
|suggests |microbenchmark | NA|
|suggests |rmarkdown | NA|
|suggests |testthat | NA|
|linking_to |cpp11 | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
fmm_to_mat (1), fmm_to_vec (1), mat_to_fmm (1), release_questions (1), sparse_to_fmm (1), vec_to_fmm (1)
fastMatMR
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in C++ (7% in 3 files), C/C++ Header (91% in 15 files) and R (2% in 3 files) - 1 authors - 3 vignettes - no internal data file - 1 imported package - 7 exported functions (median 3 lines of code) - 8 non-exported functions in R (median 3 lines of code) - 217 C/C++ functions (median 5 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 3| 21.5| | |files_src | 3| 84.3| | |files_inst | 15| 99.6| | |files_vignettes | 3| 92.4| | |files_tests | 3| 75.2| | |loc_R | 45| 4.3|TRUE | |loc_src | 202| 27.6| | |loc_inst | 2545| 81.7| | |loc_vignettes | 276| 60.9| | |loc_tests | 68| 31.5| | |num_vignettes | 3| 94.2| | |n_fns_r | 15| 21.2| | |n_fns_r_exported | 7| 34.0| | |n_fns_r_not_exported | 8| 18.0| | |n_fns_src | 217| 89.5| | |n_fns_per_file_r | 3| 46.2| | |n_fns_per_file_src | 8| 68.9| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 3| 1.1|TRUE | |loc_per_fn_r_exp | 3| 1.5|TRUE | |loc_per_fn_r_not_exp | 3| 1.5|TRUE | |loc_per_fn_src | 5| 5.0|TRUE | |rel_whitespace_R | 29| 11.6| | |rel_whitespace_src | 20| 31.5| | |rel_whitespace_inst | 22| 82.8| | |rel_whitespace_vignettes | 41| 68.6| | |rel_whitespace_tests | 24| 30.6| | |doclines_per_fn_exp | 18| 10.4| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 480| 95.1|TRUE | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/HaoZeke/fastMatMR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/HaoZeke/fastMatMR/actions) [![pkgcheck](https://github.com/HaoZeke/fastMatMR/workflows/pkgcheck/badge.svg)](https://github.com/HaoZeke/fastMatMR/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 6056018810|pages build and deployment |success |c3523b | 8|2023-09-02 | | 6056006940|pkgcheck |success |fe999d | 21|2023-09-02 | | 6056006937|pkgdown |success |fe999d | 23|2023-09-02 | | 6056006936|pre-commit |success |fe999d | 49|2023-09-02 | | 6056006938|R-CMD-check |success |fe999d | 39|2023-09-02 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following note: 1. checking installed package size ... NOTE installed size is 11.1Mb sub-directories of 1Mb or more: libs 10.2Mb R CMD check generated the following check_fail: 1. rcmdcheck_reasonable_installed_size #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 68.1 The following files are not completely covered by tests: file | coverage --- | --- inst/include/fast_matrix_market/chunking.hpp | 63.83% inst/include/fast_matrix_market/fast_matrix_market.hpp | 18.92% inst/include/fast_matrix_market/header.hpp | 59.69% inst/include/fast_matrix_market/read_body_threads.hpp | 67.44% inst/include/fast_matrix_market/read_body.hpp | 42.29% inst/include/fast_matrix_market/write_body.hpp | 50% R/misc.R | 0% #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 1 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 1
|package |version | |:--------|:-------| |pkgstats |0.1.3.8 | |pkgcheck |0.1.2.2 |
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
FWIW there's .covignore
which is not being picked up by the bot here (works on the repo, https://github.com/HaoZeke/fastMatMR/issues/18). Package coverage is:
> covr::package_coverage()
fastMatMR Coverage: 94.69%
src/from_file.cpp: 93.88%
src/to_file.cpp: 94.23%
R/fastMatMR-package.R: 100.00%
@HaoZeke is indeed right there - the failing coverage check is a bug on our side, and not this submission.
@ropensci-review-bot assign @maelle as editor
Assigned! @maelle is now the editor
:wave: @HaoZeke! Nice to see you as an author this time around, thank you for your submission! :grinning:
A few comments/questions before we proceed to the reviewer search:
->
I'd suggest using <-
instead since the former is more rare.Thank you!
👋 @HaoZeke! Nice to see you as an author this time around, thank you for your submission! 😀
^_^
A few comments/questions before we proceed to the reviewer search:
* Why add the bundled files to .covrignore, if they contribute to the functionality of the package?
This is because there is no need to expose bindings of the inner workings of the fast_matrix_market
C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.
The user-facing C++ functions are in src/
and covered by tests.
* I see you use conventional commits, nice. How is the changelog generated? I see [DOC: Fix news HaoZeke/fastMatMR#17](https://github.com/HaoZeke/fastMatMR/issues/17) Disclaimer: I contribute to https://github.com/cynkra/fledge/
That's a very cool library, I am currently using towncrier and tbump which are fairly generic tools.
* In the example where you use `->` I'd suggest using `<-` instead since the former is more rare. * In the README you write "writing and reading from other R objects". Could you please elaborate, adding a few sentences about this to the README? * To me the second part of the README, "Development", belongs to the contributing guide, so you could move the content and instead point contributors to the contributing guide. * In the contributing guide could you please add a link to pixi? I for instance do not know what it is.
Yup these are great suggestions, updated the package with them.
* In the issue tracker there are a few enhancement issues. What's the timeline for them, are they "nice to have" or do you intend to tackle them soon? If so should the review wait?
These are more version 2 issues and shouldn't block the review I think. It would take a while for me to get to them, the current functionality covers R matrices, vectors and sparse matrices from Matrix
which cover most of the use-cases. Other sparse matrix formats are on the issue list but they're very optional "nice to have" issues since users can always convert to Matrix
formats / dense types before using the library.
Thank you!
You're welcome :)
This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.
To clarify, the question was about tests specifically. Part of the code in inst/ is covered by tests anyway, so why not include them (and remove the unused code)?
We do not have strict requirements on testing coverage for bundled code yet, but we can improve our guidance based on cases like this submission, so this discussion is important.
This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.
To clarify, the question was about tests specifically. Part of the code in inst/ is covered by tests anyway, so why not include them (and remove the unused code)?
Mostly maintainence. I have actually stripped the (already small) library of header files which are not used at all (Eigen / Armadillo / other C++ library compatibility files).
In general, for bundled libraries it is a good idea to not tamper with the upstream source as much as possible (it makes it easier to update to new versions).
We do not have strict requirements on testing coverage for bundled code yet, but we can improve our guidance based on cases like this submission, so this discussion is important.
Yup, I understand. Policy wise I'd be even stricter, and say that bundled libraries must not be touched at all (other than whole-file removals if they are not relevant and self contained).
This is standard practice in bundling libraries in C++ too, where projects might have different linting rules and external
files are ignored instead of refactored in-place within the project. The idea is that this way upstream developers can weigh in more easily on the code.
It also allows people to write bindings as long as they understand the public API of the code. Often times users who want bindings are not fully aware / have the expertise to make decisions about code removal from (possibly) complicated upstream C++ code.
Thank you for your thorough answer! I'll now look for reviewers.
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/606_status.svg)](https://github.com/ropensci/software-review/issues/606)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
@ropensci-review-bot add @osorensen to reviewers
@osorensen added to the reviewers list. Review due date is 2023-10-06. Thanks @osorensen for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@osorensen: If you haven't done so, please fill this form for us to update our reviewers records.
@ropensci-review-bot add @czeildi to reviewers
@czeildi added to the reviewers list. Review due date is 2023-10-07. Thanks @czeildi for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@czeildi: If you haven't done so, please fill this form for us to update our reviewers records.
@HaoZeke and @maelle, here is my initial review. My comments at the bottom explain a bit more about the boxes I haven't checked.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 2 hours
In README, you state that the Matrix
package only handles sparse objects. This is not correct.
library(Matrix)
The command below lists all dense matrix types supported by the Matrix
package.
showClass("denseMatrix")
## Virtual Class "denseMatrix" [package "Matrix"]
##
## Slots:
##
## Name: Dim Dimnames
## Class: integer list
##
## Extends:
## Class "Matrix", directly
## Class "mMatrix", by class "Matrix", distance 2
## Class "replValueSp", by class "Matrix", distance 2
##
## Known Subclasses:
## Class "unpackedMatrix", directly
## Class "packedMatrix", directly
## Class "ndenseMatrix", directly
## Class "ldenseMatrix", directly
## Class "ddenseMatrix", directly
## Class "ngeMatrix", by class "unpackedMatrix", distance 2
## Class "ntrMatrix", by class "unpackedMatrix", distance 2
## Class "nsyMatrix", by class "unpackedMatrix", distance 2
## Class "ntpMatrix", by class "packedMatrix", distance 2
## Class "nspMatrix", by class "packedMatrix", distance 2
## Class "lgeMatrix", by class "unpackedMatrix", distance 2
## Class "ltrMatrix", by class "unpackedMatrix", distance 2
## Class "lsyMatrix", by class "unpackedMatrix", distance 2
## Class "ltpMatrix", by class "packedMatrix", distance 2
## Class "lspMatrix", by class "packedMatrix", distance 2
## Class "dgeMatrix", by class "unpackedMatrix", distance 2
## Class "dtrMatrix", by class "unpackedMatrix", distance 2
## Class "dsyMatrix", by class "unpackedMatrix", distance 2
## Class "dpoMatrix", by class "dsyMatrix", distance 3
## Class "corMatrix", by class "dpoMatrix", distance 4
## Class "dtpMatrix", by class "packedMatrix", distance 2
## Class "dspMatrix", by class "packedMatrix", distance 2
## Class "dppMatrix", by class "dspMatrix", distance 3
## Class "pcorMatrix", by class "dppMatrix", distance 4
Below is an example of a dense matrix from the Matrix
package:
(dense_matrix <- as(matrix(1:10, ncol = 2), "dMatrix"))
## 5 x 2 Matrix of class "dgeMatrix"
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
str(dense_matrix)
## Formal class 'dgeMatrix' [package "Matrix"] with 4 slots
## ..@ Dim : int [1:2] 5 2
## ..@ Dimnames:List of 2
## .. ..$ : NULL
## .. ..$ : NULL
## ..@ x : num [1:10] 1 2 3 4 5 6 7 8 9 10
## ..@ factors : list()
The README file contains a section titled “Why?”, but it does not state explicitly what problems the software is designed to solved nor what the target audience is.
There are vignettes, but they are very brief. Please provide some more extended examples, with more text explaining what is being done.
I understand that the fmm_to_*
functions need an mtx file to work, and I guess it’s therefore the examples with these functions are marked with \dontrun{}
. However, wouldn’t it possible to define a character vector in R that contains the required input, and then feed this to fmm_to_*
in order to have an example that can actually be run?
Just a quick clarification, @osorensen, the readme
wording is a bit unclear but it states in the previous section that the goal is the reading and writing of Matrix Market
files. For that, it is true that Matrix
does not support dense matrices:
> library("Matrix")
> (dense_matrix <- as(matrix(1:10, ncol = 2), "dMatrix"))
> writeMM(dense_matrix, "mat.mtx")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘writeMM’ for signature ‘"dgeMatrix"’
> ?writeMM
...
Read matrices stored in the Harwell-Boeing or MatrixMarket formats
or write ‘sparseMatrix’ objects to one of these formats.
...
The longer explanation is part of the ROpenSci issue (intended audience, use case) but I will add it to the readme soon.
P.S. Thanks for accepting the review and getting started with this :)
Thank you @osorensen!!
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/606#issuecomment-1731654678 time 2
Logged review for osorensen (hours: 2)
:calendar: @czeildi you have 2 days left before the due date for your review (2023-10-07).
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 5
All in all, I could easily install the package, understand its functionality and use it in some of my (simple) own examples without having previous knowledge of matrix market. I could navigate the code and mostly understand it (I am not experienced with cpp11 code).
I have a couple of comments, detailed below. Most of them are non-blocking and might be somewhat opinionated so feel free to apply them as you see fit (besides the ones where I did not tick the corresponding box)
Performance: I did not re-run the benchmark vignettes myself but tried the package out with some moderately large matrices and it felt reasonably fast. 👍🏼
it seems the lib cannot handle paths with ~
(?) If this cannot be made to work easily, some error message would be very helpful, I think it would be needed
to be added here: https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L26 The corresponding read file returns an error message: 'cannot open file' which is more helpful.
(I am guessing the issue is related to ~
in path because write_fmm(vec, "~/vector.mtx")
returned false for me while write_fmm(vec, "/home/ildi_home/vector.mtx")
worked)
I really appreciated the precise error message in
vec_to_fmm(1L, "vector.mtx")
#> Error: Invalid input type, expected 'double' actual 'integer'
However, I was wondering if it could actually accept integers as well?
improve error message? It is a matrix, just sparse, not plain
mat_to_fmm(sparse_mat, "sparse_matrix.mtx")
#> Error in mat_to_fmm(sparse_mat, "sparse_matrix.mtx") : object is not a matrix
improve error message?
mat <- matrix(c(1, 2, 3, 4), nrow = 2)
sparse_to_fmm(mat, "matrix.mtx")
#> Error: Invalid input type, expected 'double' actual 'NULL'
future suggestion: support reading .mtx.gz
files as I can download from for example https://math.nist.gov/MatrixMarket/data/NEP/quebec/qh1484.html ?
consider increased consistency of function names? e.g fmm_to_sparseMatrix
vs sparse_to_fmm
. sparse_to_fmm
was a bit confusing to me in that if I understand correctly, the package is 'fast', not the target format itself (?). matrix and vector is abbreviated to spare a few characters, while sparse_matrix
is not. A vague suggestion: possibly follow the pattern of fmm_write_vector
, fmm_read_to_vector
for all formats?
fname
vs filename
) are not always consistent between the documentation and the code
itselfbasic_usage
to getting started? The vignette title is getting started but you can see basic_usage
in the urladdendum
is about in the getting started readme? If I understand correctly, it demonstrates the incorrect handling of NA
by Matrix
package?Python
? Possibly rather focus on being a language agnostic format?NULL
if they have no other return value
but rather these functions are not called for their return value, but for their side effect? Maybe see how other packages handle write functions?write_fmm
example: sparse_mat
variable is overwritten before it is written to disk so it is not easy to actually run both of the two variants includedpixi.*
might not
be the correct syntax in .Rbuildignore
? (I do not know what the correct syntax would be,
but the check gives a note both in github actions and locally it seems. If all else fails, adding the files one by one instead of a wildcard syntax definitely works)DESCRIPTION
does not match package version in news
, is this intended?.gitignore
, but rather write to a temp directory. https://testthat.r-lib.org/articles/test-fixtures.html might help find a more robust solution.test_that
block in itself and it is a working test (no setup code before test_that
blocks ideally)library(testthat)
to test filesstyler::style_pkg
, see for example https://github.com/HaoZeke/fastMatMR/blob/main/tests/testthat/test-write_fmm.R#L34#nolint
comments instead of ignoring every rule (it is considered generally a best practice, but does not have real impact here). This is not needed if you would need to ignore several rulescpp11.cpp
https://github.com/HaoZeke/fastMatMR/blob/main/src/from_file.cpp#L10
My IDE underlined this with red, but it might be fine, I am not familiar with cpp bindings in R packagesThanks a lot @czeildi!!
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/606#issuecomment-1751660212 time 5
Logged review for czeildi (hours: 5)
@HaoZeke Now both reviews are in, as a reminder see https://devdevguide.netlify.app/softwarereview_author#the-review-process
@HaoZeke: please post your response with @ropensci-review-bot submit response <url to issue comment>
if you haven't done so already (this is an automatic reminder).
Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html
Sorry for the delay. Thank you reviewers and editors. I've made changes, in accordance to the suggestions and comments, and made a new release.
Response in https://github.com/ropensci/software-review/issues/606#issuecomment-1731692299.
The README file contains a section titled “Why?”, but it does not state explicitly what problems the software is designed to solved nor what the target audience is
I have rewritten the readme to explain the context.
There are vignettes, but they are very brief. Please provide some more extended examples, with more text explaining what is being done.
The functionality of the package is to read and write a particular file format, commonly used in data-science across many languages including R
. Apart from the dense matrix and vector support, the vignettes show performance improvements over the implementation in Matrix
.
I understand that the fmmto functions need an mtx file to work, and I guess it’s therefore the examples with these functions are marked with \dontrun{}. However, wouldn’t it possible to define a character vector in R that contains the required input, and then feed this to fmmto in order to have an example that can actually be run?
The write_fmm
functions can be run locally, for saving R objects as shown, and then those can be used subsequently for loading. The speed increases are largely by using C++ to load data in an optimal manner so this isn't feasible. https://github.com/HaoZeke/fastMatMR/issues/42
All in all, I could easily install the package, understand its functionality and use it in some of my (simple) own examples without having previous knowledge of matrix market. I could navigate the code and mostly understand it (I am not experienced with cpp11 code).
Glad to hear it :)
I have a couple of comments, detailed below. Most of them are non-blocking and might be somewhat opinionated so feel free to apply them as you see fit (besides the ones where I did not tick the corresponding box)
These were all excellent, and I addressed each of them in the linked issues, I will briefly link to them again.
Performance: I did not re-run the benchmark vignettes myself but tried the package out with some moderately large matrices and it felt reasonably fast. 👍🏼
Glad to hear that. The benchmark vignettes are very large and not really meant to be run often (they can take upto 30-40 minutes).
it seems the lib cannot handle paths with ~ (?) If this cannot be made to work easily, some error message would be very helpful, I think it would be needed to be added here: https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L26 The corresponding read file returns an error message: 'cannot open file' which is more helpful. (I am guessing the issue is related to ~ in path because write_fmm(vec, "~/vector.mtx") returned false for me while write_fmm(vec, "/home/ildi_home/vector.mtx") worked)
This was a pretty involved feature, but I managed to get it in https://github.com/HaoZeke/fastMatMR/issues/24.
I really appreciated the precise error message in
vec_to_fmm(1L, "vector.mtx") #> Error: Invalid input type, expected 'double' actual 'integer'
However, I was wondering if it could actually accept integers as well?
Implemented in https://github.com/HaoZeke/fastMatMR/issues/25.
improve error message? It is a matrix, just sparse, not plain
mat_to_fmm(sparse_mat, "sparse_matrix.mtx") #> Error in mat_to_fmm(sparse_mat, "sparse_matrix.mtx") : object is not a matrix
As noted in https://github.com/HaoZeke/fastMatMR/issues/26, users should be calling write_fmm
instead of the lower level functions.
improve error message?
mat <- matrix(c(1, 2, 3, 4), nrow = 2) sparse_to_fmm(mat, "matrix.mtx") #> Error: Invalid input type, expected 'double' actual 'NULL'
Same as above (https://github.com/HaoZeke/fastMatMR/issues/27), write_fmm
has the right user-facing semantics.
future suggestion: support reading .mtx.gz files as I can download from for example https://math.nist.gov/MatrixMarket/data/NEP/quebec/qh1484.html ?
This one is tracked, but not for this release. It would need quite a bit more design and I'm not sure it will be in scope. Tracked here: https://github.com/HaoZeke/fastMatMR/issues/28
consider increased consistency of function names? e.g fmm_to_sparseMatrix vs sparse_to_fmm. sparse_to_fmm was a bit confusing to me in that if I understand correctly, the package is 'fast', not the target format itself (?). matrix and vector is abbreviated to spare a few characters, while sparse_matrix is not. A vague suggestion: possibly follow the pattern of fmm_write_vector, fmm_read_to_vector for all formats?
As noted in https://github.com/HaoZeke/fastMatMR/issues/29, sparse_to_fmm has been converted to sparse_Matrix_to_fmm for consistency.
Documentation
I did not tick function documentation because although they exist, argument names (fname vs filename) are not always consistent between the documentation and the code itself
I have fixed this in https://github.com/HaoZeke/fastMatMR/issues/30.
I did not tick Examples, because although the examples run, but only in the right order, i.e. if I first run the examples of the write functions then the examples of the read functions run correctly but they are not self contained
Completed in https://github.com/HaoZeke/fastMatMR/issues/42 and https://github.com/HaoZeke/fastMatMR/issues/31. Essentially, while the write_fmm
can be self-contained, fmm_to_*
need to have matrix market files present.
readme: "Unlike the Matrix package" : maybe rephrase? it is a bit off-putting for me to start the whole readme with a negative comparison, although I understand that a core part of the motivation for this package is that Matrix does not support everything and not fast enough. Even changing the order of the two parts of the sentence would feel better for me but this is really personal In the readme: "we support / the package supports". Consider unifying these two wording styles
Completed in https://github.com/HaoZeke/fastMatMR/issues/32 and https://github.com/HaoZeke/fastMatMR/issues/33. Along with the suggestions from the other reviewer, the readme
has been more or less completely re-written, and I hope it is clearer now.
rename basic_usage to getting started? The vignette title is getting started but you can see basic_usage in the url
Completed in https://github.com/HaoZeke/fastMatMR/issues/34.
clarify what the addendum is about in the getting started readme? If I understand correctly, it demonstrates the incorrect handling of NA by Matrix package?
Completed in https://github.com/HaoZeke/fastMatMR/issues/35.
Maybe add a bit more context on why would I want to read back in Python? Possibly rather focus on being a language agnostic format?
Completed in https://github.com/HaoZeke/fastMatMR/issues/36.
consider moving the vignette summary to the beginning/remove it? It felt to me like repeating basically the same information as the beginning of the vignette
Completed in https://github.com/HaoZeke/fastMatMR/issues/37.
Did you try out logarithmic scale for the benchmark plots? Would they possibly be more readable?
Yes. All scales were considered, the plots use a mix of log-log and semi-log to show the trends, and all of them are interpreted in the text as well. Completed in https://github.com/HaoZeke/fastMatMR/issues/38.
In several function documentation you write "This function has no return value." This is technically not true, all functions return NULL if they have no other return value but rather these functions are not called for their return value, but for their side effect? Maybe see how other packages handle write functions?
Completed in https://github.com/HaoZeke/fastMatMR/issues/39.
write_fmm example: sparse_mat variable is overwritten before it is written to disk so it is not easy to actually run both of the two variants included
Completed in https://github.com/HaoZeke/fastMatMR/issues/40.
readme sentence: "Similarly, we support writing and reading from other R objects (e.g. standard R vectors and matrices), as seen in the basic use vignette." Consider rephrasing a bit, e.g. refer to 'getting started' vignette instead of basic use, possibly as described in the vignette / as you can see in the vignette instead of as seen?
Rewritten slightly for the new readme.
I am not sure how, but I think there is a relatively easy way to have the getting started vignette show up as a separate menu item in the packagedown website and not just among articles, that could improve its discoverability
Completed in https://github.com/HaoZeke/fastMatMR/issues/41.
Code
I see that R CMD check seem to give a note about pixi files, it seems pixi.* might not be the correct syntax in .Rbuildignore? (I do not know what the correct syntax would be, but the check gives a note both in github actions and locally it seems. If all else fails, adding the files one by one instead of a wildcard syntax definitely works) package version in DESCRIPTION does not match package version in news, is this intended?
Completed in https://github.com/HaoZeke/fastMatMR/issues/43.
Ideally, tests would not write to the project root in my opinion even if those files are added to .gitignore, but rather write to a temp directory. https://testthat.r-lib.org/articles/test-fixtures.html might help find a more robust solution. the readability of read unit tests could be improved if all tests are self contained and independent of each other: e.g. you can run any test_that block in itself and it is a working test (no setup code before test_that blocks ideally)
Completed in https://github.com/HaoZeke/fastMatMR/issues/44.
I think there is no need to add library(testthat) to test files
Completed in https://github.com/HaoZeke/fastMatMR/issues/45.
consistency of indentation styles could be a bit improved, consider running styler::style_pkg, see for example https://github.com/HaoZeke/fastMatMR/blob/main/tests/testthat/test-write_fmm.R#L34
These are run already, but the tests were refactored so I think it should be OK now. Completed in https://github.com/HaoZeke/fastMatMR/issues/45.
this is really overly pedantic, but you could consider specifying which rule to ignore in the #nolint comments instead of ignoring every rule (it is considered generally a best practice, but does not have real impact here). This is not needed if you would need to ignore several rules
Since these are C++ functions, the linting is basically for everything. Completed in https://github.com/HaoZeke/fastMatMR/issues/51.
pre commit github workflow name is a bit misleading to me, I understand you could add the same functionality as a pre commit hook, but a github action is a post commit/push thing, maybe title it after what it does, e.g. style checking?
Completed in https://github.com/HaoZeke/fastMatMR/issues/47.
where is this included from? I only see cpp11.cpp https://github.com/HaoZeke/fastMatMR/blob/main/src/from_file.cpp#L10 My IDE underlined this with red, but it might be fine, I am not familiar with cpp bindings in R packages
This is loaded in as part of the cpp11
package. It is found by R
by using the following directive in the DESCRIPTION
(as suggested by the documentation):
LinkingTo:
cpp11
There really isn't a good way to get an IDE to be aware of the R installed package headers. cpp11.cpp
is actually auto-generated :)
Completed in https://github.com/HaoZeke/fastMatMR/issues/46.
commented out code should be removed (https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L20)
Done in https://github.com/HaoZeke/fastMatMR/issues/48.
what happens if cpp lib call fails for some reason? Will the file handle be closed properly? See https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L30
If os.open()
fails the function will return FALSE
, and nothing else needs to be done, completed in https://github.com/HaoZeke/fastMatMR/issues/49.
possibly move rebuild-benchmarks to a tools sub directory or similar? https://r-pkgs.org/misc.html#sec-misc-tools.
Completed in https://github.com/HaoZeke/fastMatMR/issues/50.
============================
Thank you all for the great comments!
@ropensci-review-bot submit response https://github.com/ropensci/software-review/issues/606#issuecomment-1776399166
Logged author response!
@maelle sorry for the delay, I have responded and logged it via the bot :)
@HaoZeke great work and thank you for your detailed response and making your changes easy to track. I am now happy to recommend approving this package :)
I still have two small comments (neither blocking approval):
As noted in https://github.com/HaoZeke/fastMatMR/issues/26, users should be calling write_fmm instead of the lower level functions.
If the users are not supposed to be calling the lower level functions, do they need to be exported, and thus being part of the public user interface of the package? It could be less confusing possibly if only those functions are exported which are meant to be called directly by users.
@HaoZeke great work and thank you for your detailed response and making your changes easy to track. I am now happy to recommend approving this package :)
Wonderful, thanks so much ^_^
I still have two small comments (neither blocking approval):
- I think there is a typo in .Rbuildignore, I submitted a PR to fix: Fix typo in .Rbuildignore HaoZeke/fastMatMR#63
Merging soon :)
As noted in HaoZeke/fastMatMR#26, users should be calling write_fmm instead of the lower level functions.
If the users are not supposed to be calling the lower level functions, do they need to be exported, and thus being part of the public user interface of the package? It could be less confusing possibly if only those functions are exported which are meant to be called directly by users.
I thought about this, but decided not to mainly because downstream package users (including me for other dependent packages) would want to call these functions where it makes sense (e.g. if my downstream package will only write matrix data or sparse matrix data) and CRAN disallows / warns calling unexported functions via :::
I thought about this, but decided not to mainly because downstream package users (including me for other dependent packages) would want to call these functions where it makes sense (e.g. if my downstream package will only write matrix data or sparse matrix data) and CRAN disallows / warns calling unexported functions via
:::
Makes sense, thank you for the response!
Thanks @HaoZeke, I have completed my checklist.
@maelle since the reviews are satisfactorily completed (thanks all) I was wondering whta the next steps are :)
:wave: @HaoZeke! Thanks for your work!
Thanks @czeildi @osorensen! Could please use the reviewer approval template for approval? Thanks!
Estimated hours spent reviewing: 3
Estimated hours spent reviewing: 6
@ropensci-review-bot approve fastMatMR
Approved! Thanks @HaoZeke for submitting and @osorensen, @czeildi for your reviews! :grin:
To-dos:
@ropensci-review-bot invite me to ropensci/<package-name>
which will re-send an invitation.@ropensci-review-bot finalize transfer of <package-name>
where <package-name>
is the repo/package name. This will give you admin access back.pkgdown
website and are ok relying only on rOpenSci central docs building and branding,
pkgdown
website with a redirecting pagehttps://docs.ropensci.org/package_name
URL
field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar, https://github.com/ropensci/foobar
codemetar::write_codemeta()
in the root of your package.install.packages("<package-name>", repos = "https://ropensci.r-universe.dev")
thanks to R-universe.Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"
-type contributors in the Authors@R
field (with their consent).
Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions.
We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.
Last but not least, you can volunteer as a reviewer via filling a short form.
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/606#issuecomment-1731654678 time 3
Logged review for osorensen (hours: 3)
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/606#issuecomment-1751660212 time 6
Date accepted: 2023-10-30 Submitting Author Name: Rohit Goswami Submitting Author Github Handle: !--author1-->@HaoZeke<!--end-author1-- Repository: https://github.com/HaoZeke/fastMatMR Version submitted: Submission type: Standard Editor: !--editor-->@maelle<!--end-editor-- Reviewers: @osorensen, @czeildi
Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The matrix market exchange formats are crucial to much of the ecosystem. The fastMatMR package focuses on high-performance read and write operations for Matrix Market files, serving as a key tool for data extraction in computational and data science pipelines.
Data scientists who might want to load and test the NIST matrices which include:
Additionally, this makes its simpler to interface to
scipy
and the rest of the data science ecosystem. This also includes working with the Tensor Algebra Compiler (TACO).The
Matrix
package inR
can perform similar operations but only for sparse matrices. ThefastMatMR
package not only provides enhanced performance but also extends support to dense matrices and vectors inbase R
, thus offering a more versatile solution.We have both read and write performance vignettes backing up the claims made.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any
pkgcheck
items which your package is unable to pass.The package passes
pkcheck
: https://github.com/HaoZeke/fastMatMR/issues/18, though the review bot disagrees :)Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[X] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct