Open giovsaraceno opened 3 months ago
@ropensci-review-bot check srr
@ropensci-review-bot check srr
:heavy_check_mark: This package complies with > 50% of all standads and may be submitted.
Thanks for the submission @giovsaraceno ! I'm getting some advice from the other editors about your question. One thing that would be really helpful - could you push up your documentation to a GitHub page?
From the usethis
package, there's a function that helps setting it up:
https://usethis.r-lib.org/reference/use_github_pages.html
Hi @giovsaraceno, Mark here from the rOpenSci stats team to answer your question. We've done our best to clarify the role of Probability Distributions Standards:
Unlike most other categories of standards, packages which fit in this category will also generally be expected to fit into at least one other category of statistical software. Reflecting that expectation, standards for probability distributions will be expected to only pertain to some (potentially small) portion of code in any package.
So packages should generally fit within some main category, with Probability Distributions being an additional category. In your case, Dimensionality Reduction seems like the appropriate main category, but it seems like your package would also fit within Probability Distributions. Given that, the next step would be for you to estimate what proportion of those standards you think might apply to your package? Our general rule-of-thumb is that at least 50% should apply, but for Probability Distributions as an additional category, that figure may be lower.
We are particularly keen to document compliance with this category, because it is where our standards have a large overlap with many core routines of the R language itself. As always, we encourage feedback on our standards, so please also feel very welcome to open issues in the Stats Software repository, or add comments or questions in the discussion pages. Thanks for you submission!
Thanks for the submission @giovsaraceno ! I'm getting some advice from the other editors about your question. One thing that would be really helpful - could you push up your documentation to a GitHub page?
From the
usethis
package, there's a function that helps setting it up: https://usethis.r-lib.org/reference/use_github_pages.html
Thanks @ldecicco-USGS for your guidance during this process. Following your suggestion, I've now pushed the documentation for the QuadratiK package to a GitHub page. You can find it displayed on the main page of the GitHub repository. Here's the direct link for easy access: QuadratiK package GitHub page.
Hi @giovsaraceno, Mark here from the rOpenSci stats team to answer your question. We've done our best to clarify the role of Probability Distributions Standards:
Unlike most other categories of standards, packages which fit in this category will also generally be expected to fit into at least one other category of statistical software. Reflecting that expectation, standards for probability distributions will be expected to only pertain to some (potentially small) portion of code in any package.
So packages should generally fit within some main category, with Probability Distributions being an additional category. In your case, Dimensionality Reduction seems like the appropriate main category, but it seems like your package would also fit within Probability Distributions. Given that, the next step would be for you to estimate what proportion of those standards you think might apply to your package? Our general rule-of-thumb is that at least 50% should apply, but for Probability Distributions as an additional category, that figure may be lower.
We are particularly keen to document compliance with this category, because it is where our standards have a large overlap with many core routines of the R language itself. As always, we encourage feedback on our standards, so please also feel very welcome to open issues in the Stats Software repository, or add comments or questions in the discussion pages. Thanks for you submission!
Hi Mark,
Thank you for the additional clarification regarding the standards for Probability Distributions and their integration with other statistical software categories. Following your guidance, we have conducted a thorough review of the standards applicable to the Probability Distributions category in relation to our package.
Based on our assessment, we found that the current version of our package satisfies 14% of the standards directly. Furthermore, we identified that an additional 36% of the standards could potentially apply to our package, but this would require us to make some enhancements, including the addition of checks and test codes. We feel the remaining 50% of the standards are not applicable to our package.
We are committed to improve our package and aim to fulfill the applicable standards. To this end, we plan to work on a separate branch dedicated to implementing these enhancements, with the goal of meeting the 50% of the standards for the Probability Distributions category. Before proceeding, we would greatly appreciate your opinion on this plan.
Thank you for your time and support. Giovanni
Hi @giovsaraceno, Mark here from the rOpenSci stats team to answer your question. We've done our best to clarify the role of Probability Distributions Standards:
Unlike most other categories of standards, packages which fit in this category will also generally be expected to fit into at least one other category of statistical software. Reflecting that expectation, standards for probability distributions will be expected to only pertain to some (potentially small) portion of code in any package.
So packages should generally fit within some main category, with Probability Distributions being an additional category. In your case, Dimensionality Reduction seems like the appropriate main category, but it seems like your package would also fit within Probability Distributions. Given that, the next step would be for you to estimate what proportion of those standards you think might apply to your package? Our general rule-of-thumb is that at least 50% should apply, but for Probability Distributions as an additional category, that figure may be lower. We are particularly keen to document compliance with this category, because it is where our standards have a large overlap with many core routines of the R language itself. As always, we encourage feedback on our standards, so please also feel very welcome to open issues in the Stats Software repository, or add comments or questions in the discussion pages. Thanks for you submission!
Hi Mark,
Thank you for the additional clarification regarding the standards for Probability Distributions and their integration with other statistical software categories. Following your guidance, we have conducted a thorough review of the standards applicable to the Probability Distributions category in relation to our package.
Based on our assessment, we found that the current version of our package satisfies 14% of the standards directly. Furthermore, we identified that an additional 36% of the standards could potentially apply to our package, but this would require us to make some enhancements, including the addition of checks and test codes. We feel the remaining 50% of the standards are not applicable to our package.
We are committed to improve our package and aim to fulfill the applicable standards. To this end, we plan to work on a separate branch dedicated to implementing these enhancements, with the goal of meeting the 50% of the standards for the Probability Distributions category. Before proceeding, we would greatly appreciate your opinion on this plan.
Thank you for your time and support. Giovanni
Hi Mark,
We addressed the enhancements we discussed, and our package now meets 50% of the standards for the Probability Distributions category. These updates are in the probability-distributions-standards branch of our repository. We would like your opinion on merging this branch with the submitted version of the package.
Thank you, Giovanni
Hi Giovanni, your srrstats
tags for probability distribution standards definitely look good enough to proceed. That said, one aspect which could be improved, and which I would request if I were reviewing the package, is the compliance statements in the tests. In both test-dpkb.R
and test-rkpb.R
you claim compliance in single statements at the start, yet I can't really see where or how a few of these are really complied with. In particular, there do not appear to be explicit tests for output values, as these are commonly tested using test_equal
with an explicit tolerance
parameter, which you don't have. It is also not clear to me where and how you compare results of different distributions, because you have no annotations in the tests about what the return values of the functions are.
Those are very minor points which you may ignore for the moment if you'd like to get the review process started, or you could quickly address them straight away if you prefer. Either way, feel free to ask the bot to check srr
when you think you're ready to proceed. Thanks!
Hi, thank you for your suggestions on our compliance statements and testing practices. Regarding the explicit testing for output values and the use of _testequal with a tolerance parameter, we aimed to ensure that our functions return the expected outputs. However, we recognize that our current tests may not explicitly demonstrate compliance with this standard in the way you've described. We're uncertain about the best approach to incorporate _testequal with a tolerance parameter effectively, for testing the numeric equality of outputs from the provided random generation and density functions. Can you provide some tips?
As for comparing results from different distributions, the rpkb function in our package provides options to generate random observations using three distinct algorithms based on different probability distributions. We've conducted tests to confirm that each method functions as intended. We added also a new vignette in which the methods are compared by graphically displaying the generated points. Is this what you are looking for?
We're inclined to address them promptly. We would appreciate if we can get an answer to the questions posed above so that we can start the review process. Thanks, Giovanni
Sorry we didn't reply faster, @giovsaraceno. In, say, a single-variable distribution tests might include:
Thanks @noamross for your explanation. We have taken your suggestions into consideration and have implemented them accordingly. We are now ready to request the automatic bot check for our package. We look forward to any further instructions or feedback that might come from this next step.
@ropensci-review-bot check package
Thanks, about to send the query.
:rocket:
The following problems were found in your submission template:
:wave:
git hash: 21541a40
Important: All failing checks above must be addressed prior to proceeding
(Checks marked with :eyes: may be optionally addressed.)
Package License: GPL (>= 3)
srr
package)This package is in the following category:
:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (204 complied with; 49 N/A standards)
Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report()
function from within a local clone of the repository.
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:------------|------:|
|internal |base | 382|
|internal |QuadratiK | 50|
|internal |utils | 10|
|internal |grDevices | 1|
|imports |stats | 29|
|imports |methods | 26|
|imports |sn | 14|
|imports |ggpp | 2|
|imports |cluster | 1|
|imports |mclust | 1|
|imports |moments | 1|
|imports |rrcov | 1|
|imports |clusterRepro | NA|
|imports |doParallel | NA|
|imports |foreach | NA|
|imports |ggplot2 | NA|
|imports |ggpubr | NA|
|imports |MASS | NA|
|imports |movMF | NA|
|imports |mvtnorm | NA|
|imports |Rcpp | NA|
|imports |RcppEigen | NA|
|imports |rgl | NA|
|imports |rlecuyer | NA|
|imports |Tinflex | NA|
|suggests |knitr | NA|
|suggests |rmarkdown | NA|
|suggests |roxygen2 | NA|
|suggests |testthat | NA|
|linking_to |Rcpp | NA|
|linking_to |RcppEigen | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
list (46), data.frame (26), matrix (24), nrow (23), t (20), log (19), rep (19), ncol (18), c (14), numeric (12), for (11), sqrt (10), length (8), mean (8), as.numeric (6), return (6), sample (6), T (6), vapply (6), apply (5), as.factor (5), table (5), unique (5), as.vector (4), cumsum (4), exp (4), rbind (4), sum (4), as.matrix (3), kappa (3), lapply (3), lgamma (3), pi (3), q (3), replace (3), unlist (3), as.integer (2), diag (2), max (2), readline (2), rownames (2), rowSums (2), which (2), which.max (2), with (2), beta (1), colMeans (1), expand.grid (1), F (1), factor (1), if (1), levels (1), norm (1), rep.int (1), round (1), seq_len (1), subset (1)
DOF (3), kbNormTest (3), normal_CV (3), C_d_lambda (2), compute_CV (2), cv_ksample (2), d2lpdf (2), dlpdf (2), lpdf (2), norm_vec (2), objective_norm (2), poisson_CV (2), rejvmf (2), sample_hypersphere (2), statPoissonUnif (2), compare_qq (1), compute_stats (1), computeKernelMatrix (1), computePoissonMatrix (1), dpkb (1), elbowMethod (1), generate_SN (1), NonparamCentering (1), objective_2 (1), objective_k (1), ParamCentering (1), pkbc_validation (1), rejacg (1), rejpsaw (1), select_h (1), stat_ksample_cpp (1), stat2sample (1)
df (12), quantile (4), dist (2), rnorm (2), runif (2), aggregate (1), cov (1), D (1), qchisq (1), sd (1), sigma (1), uniroot (1)
setMethod (12), setGeneric (8), new (3), setClass (3)
rmsn (14)
data (8), prompt (2)
annotate (2)
silhouette (1)
colorRampPalette (1)
adjustedRandIndex (1)
skewness (1)
PcaLocantore (1)
base
QuadratiK
stats
methods
sn
utils
ggpp
cluster
grDevices
mclust
moments
rrcov
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in C++ (17% in 2 files) and R (83% in 12 files) - 4 authors - 5 vignettes - 1 internal data file - 21 imported packages - 24 exported functions (median 14 lines of code) - 56 non-exported functions in R (median 16 lines of code) - 16 R functions (median 13 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 12| 65.5| | |files_src | 2| 79.1| | |files_vignettes | 5| 96.9| | |files_tests | 10| 90.7| | |loc_R | 1408| 76.6| | |loc_src | 281| 34.1| | |loc_vignettes | 235| 55.3| | |loc_tests | 394| 70.0| | |num_vignettes | 5| 97.9|TRUE | |data_size_total | 11842| 71.9| | |data_size_median | 11842| 80.1| | |n_fns_r | 80| 70.4| | |n_fns_r_exported | 24| 72.5| | |n_fns_r_not_exported | 56| 70.6| | |n_fns_src | 16| 40.4| | |n_fns_per_file_r | 5| 67.1| | |n_fns_per_file_src | 8| 69.1| | |num_params_per_fn | 5| 69.6| | |loc_per_fn_r | 15| 46.1| | |loc_per_fn_r_exp | 14| 35.1| | |loc_per_fn_r_not_exp | 16| 54.8| | |loc_per_fn_src | 13| 41.6| | |rel_whitespace_R | 24| 82.7| | |rel_whitespace_src | 18| 36.2| | |rel_whitespace_vignettes | 16| 29.2| | |rel_whitespace_tests | 34| 78.1| | |doclines_per_fn_exp | 50| 62.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 50| 66.3| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges (There do not appear to be any) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 8851531581|pages build and deployment |success |21541a | 25|2024-04-26 | | 8851531648|pkgcheck |failure |21541a | 60|2024-04-26 | | 8851531643|pkgdown |success |21541a | 25|2024-04-26 | | 8851531649|R-CMD-check |success |21541a | 83|2024-04-26 | | 8851531642|test-coverage |success |21541a | 83|2024-04-26 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following warning: 1. checking whether package ‘QuadratiK’ can be installed ... WARNING Found the following significant warnings: Warning: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'. See ‘/tmp/RtmpQrtXuf/file133861d90686/QuadratiK.Rcheck/00install.out’ for details. R CMD check generated the following note: 1. checking installed package size ... NOTE installed size is 16.6Mb sub-directories of 1Mb or more: libs 15.0Mb R CMD check generated the following check_fails: 1. no_import_package_as_a_whole 2. rcmdcheck_examples_run_without_warnings 3. rcmdcheck_significant_compilation_warnings 4. rcmdcheck_reasonable_installed_size #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 78.21 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following function have cyclocomplexity >= 15: function | cyclocomplexity --- | --- select_h | 46 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 20 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 9 Lines should not be more than 80 characters. | 9 Use <-, not =, for assignment. | 2
:heavy_multiplication_x: Package contains the following unexpected files: - src/RcppExports.o - src/kernel_function.o :heavy_multiplication_x: The following function name is duplicated in other packages: - - `extract_stats` from ggstatsplot
|package |version | |:--------|:--------| |pkgstats |0.1.3.13 | |pkgcheck |0.1.2.21 | |srr |0.1.2.9 |
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
We have solved all the marked items and we are now ready to request the automatic bot check. Thanks
@ropensci-review-bot check package
Thanks, about to send the query.
:rocket:
The following problems were found in your submission template:
:wave:
Hi @jooolia, thanks for checking the package. Can you give us indications on how we should address the listed problems? At the moment, we do not know which information to insert in the mentioned fields (editor, reviewers and due-dates list). Thanks in advance
@jooolia The automated checks failed because of issue linked to above. @giovsaraceno When you've fixed this issue and confirmed that pkgcheck workflows once again succeed in your repo, please call @ropensci-review-bot check package
here to run checks again. Thanks
@ropensci-review-bot check package
Thanks, about to send the query.
:rocket:
The following problems were found in your submission template:
:wave:
git hash: d6b6bf47
Package License: GPL (>= 3)
srr
package)This package is in the following category:
:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (213 complied with; 49 N/A standards)
Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report()
function from within a local clone of the repository.
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:------------|------:|
|internal |base | 415|
|internal |QuadratiK | 56|
|internal |utils | 14|
|internal |grDevices | 1|
|imports |stats | 33|
|imports |methods | 26|
|imports |rgl | 25|
|imports |sn | 8|
|imports |ggpubr | 3|
|imports |ggpp | 2|
|imports |Tinflex | 2|
|imports |cluster | 1|
|imports |clusterRepro | 1|
|imports |mclust | 1|
|imports |moments | 1|
|imports |movMF | 1|
|imports |mvtnorm | 1|
|imports |rrcov | 1|
|imports |doParallel | NA|
|imports |foreach | NA|
|imports |ggplot2 | NA|
|imports |Rcpp | NA|
|imports |RcppEigen | NA|
|imports |rlecuyer | NA|
|suggests |knitr | NA|
|suggests |rmarkdown | NA|
|suggests |roxygen2 | NA|
|suggests |testthat | NA|
|linking_to |Rcpp | NA|
|linking_to |RcppEigen | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
list (58), data.frame (29), matrix (22), nrow (21), rep (20), t (20), log (19), ncol (19), c (15), numeric (12), sqrt (12), for (11), return (10), as.numeric (8), length (8), mean (8), T (8), apply (7), det (6), sample (6), vapply (6), as.factor (5), rownames (5), table (5), unique (5), as.vector (4), cumsum (4), exp (4), rbind (4), sum (4), as.matrix (3), diag (3), kappa (3), lgamma (3), pi (3), q (3), replace (3), unlist (3), as.integer (2), max (2), readline (2), rowSums (2), which (2), which.max (2), with (2), beta (1), colMeans (1), expand.grid (1), F (1), factor (1), if (1), lapply (1), levels (1), rep.int (1), round (1), seq_len (1), subset (1)
kbNormTest (4), compute_CV (3), normal_CV (3), sample_hypersphere (3), C_d_lambda (2), cv_ksample (2), d2lpdf (2), dlpdf (2), DOF_norm (2), lpdf (2), norm_vec (2), objective_norm (2), poisson_CV (2), rejvmf (2), statPoissonUnif (2), compare_qq (1), compute_stats (1), computeKernelMatrix (1), computePoissonMatrix (1), DOF (1), dpkb (1), elbowMethod (1), generate_SN (1), NonparamCentering (1), objective_2 (1), objective_k (1), ParamCentering (1), pkbc_validation (1), rejacg (1), rejpsaw (1), root_func (1), rpkb (1), select_h (1), stat_ksample_cpp (1), stat2sample (1), var_norm (1)
df (13), quantile (6), dist (2), qchisq (2), rnorm (2), runif (2), aggregate (1), cov (1), D (1), sd (1), sigma (1), uniroot (1)
setMethod (12), setGeneric (8), new (3), setClass (3)
plot3d (6), rgl.spheres (6), title3d (5), next3d (4), layout3d (2), open3d (2)
data (12), prompt (2)
rmsn (8)
ggarrange (3)
geom_table_npc (2)
Tinflex.sample (1), Tinflex.setup.C (1)
silhouette (1)
IGP.clusterRepro (1)
colorRampPalette (1)
adjustedRandIndex (1)
skewness (1)
rmovMF (1)
rmvnorm (1)
PcaLocantore (1)
base
QuadratiK
stats
methods
rgl
utils
sn
ggpubr
ggpp
Tinflex
cluster
clusterRepro
grDevices
mclust
moments
movMF
mvtnorm
rrcov
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in C++ (20% in 2 files) and R (80% in 14 files) - 4 authors - 5 vignettes - 3 internal data files - 20 imported packages - 28 exported functions (median 10 lines of code) - 62 non-exported functions in R (median 15 lines of code) - 20 R functions (median 13 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 14| 70.8| | |files_src | 2| 79.1| | |files_vignettes | 15| 99.8| | |files_tests | 10| 90.7| | |loc_R | 1486| 77.8| | |loc_src | 373| 40.5| | |loc_vignettes | 319| 65.7| | |loc_tests | 398| 70.2| | |num_vignettes | 5| 97.9|TRUE | |data_size_total | 77179| 82.0| | |data_size_median | 11842| 80.1| | |n_fns_r | 90| 73.7| | |n_fns_r_exported | 28| 76.4| | |n_fns_r_not_exported | 62| 73.1| | |n_fns_src | 20| 45.8| | |n_fns_per_file_r | 5| 70.9| | |n_fns_per_file_src | 10| 76.7| | |num_params_per_fn | 4| 54.6| | |loc_per_fn_r | 14| 42.9| | |loc_per_fn_r_exp | 10| 24.4| | |loc_per_fn_r_not_exp | 16| 52.0| | |loc_per_fn_src | 13| 41.6| | |rel_whitespace_R | 24| 83.3| | |rel_whitespace_src | 22| 45.1| | |rel_whitespace_vignettes | 315| 99.4|TRUE | |rel_whitespace_tests | 33| 78.1| | |doclines_per_fn_exp | 33| 38.5| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 56| 68.7| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges (There do not appear to be any) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 9366235244|pages build and deployment |success |d6b6bf | 89|2024-06-04 | | 9366235344|pkgcheck |success |d6b6bf | 125|2024-06-04 | | 9366235340|pkgdown |success |d6b6bf | 90|2024-06-04 | | 9366235341|R-CMD-check |success |d6b6bf | 148|2024-06-04 | | 9366235339|test-coverage |success |d6b6bf | 148|2024-06-04 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following note: 1. checking installed package size ... NOTE installed size is 19.9Mb sub-directories of 1Mb or more: libs 18.1Mb R CMD check generated the following check_fails: 1. no_import_package_as_a_whole 2. rcmdcheck_reasonable_installed_size #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 79.39 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following function have cyclocomplexity >= 15: function | cyclocomplexity --- | --- select_h | 46 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 29 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 18 Lines should not be more than 80 characters. This line is 102 characters. | 1 Lines should not be more than 80 characters. This line is 81 characters. | 1 Lines should not be more than 80 characters. This line is 82 characters. | 3 Lines should not be more than 80 characters. This line is 83 characters. | 2 Lines should not be more than 80 characters. This line is 91 characters. | 1 Lines should not be more than 80 characters. This line is 93 characters. | 2 Lines should not be more than 80 characters. This line is 98 characters. | 1
|package |version | |:--------|:--------| |pkgstats |0.1.5.2 | |pkgcheck |0.1.2.42 | |srr |0.1.3.2 |
This package is in top shape and may be passed on to a handling editor
@ropensci-review-bot assign @emitanaka as editor
Assigned! @emitanaka is now the editor
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/632_status.svg)](https://github.com/ropensci/software-review/issues/632)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
@ropensci-review-bot add @kasselhingee as reviewer
Can't assign reviewer because there is no editor assigned for this submission yet
@ropensci-review-bot assign @kasselhingee as reviewer
Can't assign reviewer because there is no editor assigned for this submission yet
@mpadge it worked before but not sure why adding a reviewer is not working anymore here?
@emitanaka Can you please try again?
@ropensci-review-bot assign @kasselhingee as reviewer
Can't assign reviewer because there is no editor assigned for this submission yet
@ropensci-review-bot add @kasselhingee as reviewer
Can't assign reviewer because there is no editor assigned for this submission yet
@mpadge Nope, still not working
@emitanaka Sorry about that. The issue template at the very top had been modified, including removing the "editor" field needed by the bot to identify you. I've reinstated everything now, so should be okay.
@giovsaraceno There are a couple of fields which still need to be filled in. Can you please edit the top of the initial issue text at the top and fill out:
Thanks!
@mpadge I have modified the initial issue text by additing the version submitted and choosing the badge grade. Please let us know if anything else is needed. Thanks!
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/632_status.svg)](https://github.com/ropensci/software-review/issues/632)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
We have added the provided badge into the README file and added the NEWS.md file into the package.
@ropensci-review-bot add @kasselhingee as reviewer
@kasselhingee added to the reviewers list. Review due date is 2024-07-16. Thanks @kasselhingee for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@kasselhingee: If you haven't done so, please fill this form for us to update our reviewers records.
It is working now. Thank you @mpadge !
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing:
I'm only partially familiar with the area of spherical data. Both your package's tests of uniformity and clustering sound useful. The kb.test()
tests sound complicated but I'm sure with more explanation their use will become clear.
I found it really hard to understand your package initially. That was because most of the documentation ignores G1.3 on explaining statistical terms. For example, I jumped into the kb.test()
part of the package which was really confusing until I found an arXiv document on QuadratiK that had more information.
Although the other parts were less confusing to me, they still didn't explain themselves well, often stating that the function performs "the [a bespoke method by authors]" without explanation of the method.
Because of this, currently your package feels only usable by expert statisticians who have read your papers and want to try out your methods.
I would love it if your readme described the major benefits with more detail. For example, that pk.test()
performs much better than competitors when the alternative is multimodal.
Your answers to rOpenSci here state that novel and unique kernel-based methods are used, but you don't say what is good about them. Also, please clarify if the kernel in the 'Poisson kernel-based densities' different from the kernel in your 'kernel-based quadratic distances'.
More on following G1.3. There are many unexplained terms I've never heard of before, a few used differently to expectation, and others used vaguely. At times it felt like these were all the terms!
rpkb()
.The following appear to be met from the documentation and function results:
However, I haven't checked if they are implemented correctly and the package lacks tests to confirm results in simple situations.
These claims seem unmet to me:
devtools::install_github("https://github.com/giovsaraceno/QuadratiK-package/tree/master")
)Seem to cover the main uses of the package
One vignette can't be run locally
Overall they appear to be written for people already familiar with the authors' papers or the other vignettes.
wireless_clustering
vignette
validation()
doesn't exist. I'm guessing it is now pkbc_validation()
.pkbc_validation()
?plot(res_pk)
would be nice, with an example plot showing why K=4 looks appropriate.k-sample test vignette
select_h()
?h_k <- select_h(x=x, y=y, alternative="skewness")
takes a really long time on my machine, would be good to mention in the help for select_h()
that it takes a long time.I quickly scanned and checked the remaining vignettes could render, but didn't run the code myself.
I suspect help assumes user has read your references e.g.,
pkbc_validation()
, how do I interpret all the measures? kb.test()
see my comments about k-sample tests in the README.kb.test()
NULL
?kb.test()
. What are the U-statistics, what is Vn, and a V-statistic etc?kb.test
class
kb.test()
and vice versa to explain the objects.pk.test()
and pk.test-class
rho
is NULL?pk.test()
is advised - from the referenced paper it is when the alternative is multimodal.pkbc()
pkbc_validation()
pkbc_validation(res)$metrics
are the different cluster numbers in res
?plot.pkbc()
?plot.pkbc
and help(plot.pkbc)
both find nothing. But, for example help(plot.TinflexC)
gets me to the appropriate help. Do you know why it isn't working? The manual for plot.pkbc
is crucial to understanding the plots so should be accessible from the console.wireless_clustering
vignette plot(res_pk)
choosing scatter plot then 4 clusters to display didn't make a plot at all! Is this a bug, or what it should be doing? And surely it means displaying the 4-cluster model, rather than say displaying the first 4 clusters of the 6-cluster model?select_h()
wine
data set
wireless
data set
I get warnings from geom_table_npc()
when I run the examples. I'm using version 0.5.7 of package ggpp.
pk.test()
rejects uniformity correctly?pkbc()
doesn't seem to check that it gets the clustering correct in a simple situationand
pkbc_validation()` (these all seem to test that the structure/class of the output is correct, but not the actual values of the output)pk.test()
on multimodal data vs another uniform testing methoddevtools::spell_check()
Submitting Author Giovanni Saraceno \ Submitting Author Github Handle: !--author1-->@giovsaraceno<!--end-author1-- \ Other Package Authors Github handles: !--author-others-->@rmj3197<!--end-author-others-- \ Repository: https://github.com/giovsaraceno/QuadratiK-package§\ Version submitted:1.1.1 Submission type: Stats \ Badge grade: gold Editor: !--editor-->@emitanaka<!--end-editor-- Reviewers: !--reviewers-list-->@kasselhingee<!--end-reviewers-list--
Due date for @kasselhingee: 2024-07-16Archive: TBD Version accepted: TBD
Scope
Data Lifecycle Packages
Statistical Packages
[ ] Bayesian and Monte Carlo Routines
[x] Dimensionality Reduction, Clustering, and Unsupervised Learning
[ ] Machine Learning
[ ] Regression and Supervised Learning
[ ] Exploratory Data Analysis (EDA) and Summary Statistics
[ ] Spatial Analyses
[ ] Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
This category is the most suitable due to QuadratiK's clustering technique, specifically designed for spherical data. The package's clustering algorithm falls within the realm of unsupervised learning, where the focus is on identifying groupings in the data without pre-labeled categories. The two- and k-sample tests serve as additional tools for testing the differences between the identified groups. \ Following the link https://stats-devguide.ropensci.org/standards.html we noticed in the "Table of contents" that category 6.9 refers to Probability Distribution. We are unsure how we fit and if we fit this category. Can you please advise?
Yes, we have incorporated documentation of standards into our QuadratiK package by utilizing the srr package, considering the categories "General" and "Dimensionality Reduction, Clustering, and Unsupervised Learning", in line with the recommendations provided in the rOpenSci Statistical Software Peer Review Guide.
The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions. Spherical data structures are common in fields such as biology, geosciences and astronomy, where data points are naturally mapped to a sphere. QuadratiK provides a tailored approach to effectively handle and interpret these data. Furthermore, this package is also of particular interest to professionals in health and biological sciences, where understanding and interpreting spherical data can be crucial in studies ranging from molecular biology to epidemiology. Moreover, its implementation in both R and Python broadens its accessibility, catering to a wide audience accustomed to these popular programming languages.
Yes, there are other R packages that address goodness-of-fit (GoF) testing and multivariate analysis. Notable among these are the energy package for energy statistics-based tests. The function kmmd in the kernlab package offers a kernel-based test which has similar mathematical formulation. The package sphunif provides all the tests for uniformity on the sphere available in literature. The list of implemented tests includes the test for uniformity based on the Poisson kernel. However, there are fundamental differences between the methods encoded in the aforementioned packages and those offered in the QuadratiK package.
QuadratiK uniquely focuses on kernel-based quadratic distances methods for GoF testing, offering a comprehensive set of tools for one-sample, two-sample, and k-sample tests. This specialization provides more nuanced and robust methodologies for statistical analysis, especially in complex multivariate contexts. QuadratiK is optimized for high-dimensional datasets, employing efficient C++ implementations. This makes it particularly suitable for contemporary large-scale data analysis challenges. The package introduces advanced methods for kernel centering and critical value computation, as well as optimal tuning parameter selection based on midpower analysis. QuadratiK includes a unique clustering algorithm for spherical data. These innovations are not covered in other available packages. With implementations in both R and Python, QuadratiK appeals to a wider audience across different programming communities. We also provide a user-friendly dashboard application which further enhances accessibility, catering to users with varying levels of statistical and programming expertise.
In summary there are fundamental differences between QuadratiK and all existing R packages:
Yes, our package, QuadratiK, is compliant with the rOpenSci guidelines on Ethics, Data Privacy, and Human Subjects Research. We have carefully considered and adhered to ethical standards and data privacy laws relevant to our work.
Please see the question posed in the first bullet.