editorialbot commented 1 year ago

Submitting author: !--author-handle-->@gmrandazzo@jbytecode<!--end-editor-- Reviewers: @mikeaalv, @faosorios Archive: 10.5281/zenodo.8436823

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/afc8dfc4cdd496f6f51813dbaa5ad310"><img src="https://joss.theoj.org/papers/afc8dfc4cdd496f6f51813dbaa5ad310/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/afc8dfc4cdd496f6f51813dbaa5ad310/status.svg)](https://joss.theoj.org/papers/afc8dfc4cdd496f6f51813dbaa5ad310)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@mikeaalv & @faosorios, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @mikeaalv

📝 Checklist for @faosorios

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 year ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.18 s (791.5 files/s, 213083.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               68           4019           3316          21389
Python                          22            779           1029           2022
C/C++ Header                    29            437           1549           1051
reStructuredText                 6            225            126            288
Markdown                         5            100              0            258
CMake                            4             45              2            253
TeX                              1             19              0            161
DOS Batch                        1             21              1            148
make                             1             21              4            105
YAML                             1              4              0             39
HTML                             1              1              0              8
-------------------------------------------------------------------------------
SUM:                           139           5671           6027          25722
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/0003-2670(86)80028-9 is OK
- 10.1002/cem.1180010107 is OK
- 10.1002/(SICI)1099-128X(199809/10)12:5<301::AID-CEM515>3.0.CO;2-S is OK
- 10.1002/qsar.19960150402 is OK
- 10.1177/108705719600100308 is OK
- 10.1016/j.aca.2016.02.014 is OK
- 10.1016/j.chemolab.2016.11.010 is OK
- 10.1126/sciadv.abf2665 is OK
- 10.3389/fcimb.2022.897291 is OK

MISSING DOIs

- 10.1080/00224065.2002.11980180 may be a valid DOI for title: Multivariate analysis of quality : an introduction

INVALID DOIs

- https://doi.org/10.1016/j.jchromb.2017.04.032 is INVALID because of 'https://doi.org/' prefix
- https://doi.org/10.1016/j.chroma.2019.460661 is INVALID because of 'https://doi.org/' prefix

editorialbot commented 1 year ago

Wordcount for paper.md is 833

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 1 year ago

Dear @mikeaalv and @faosorios

This is the review thread. Firstly, type

@editorialbot generate my checklist

to generate your own checklist. In that checklist, there are many check items. Whenever you complete the corresponding task, you can check off them.

Please write your comments as separate posts and do not modify your checklist descriptions.

The review process is interactive so you can always interact with the authors, reviewers, and the editor. You can also create issues and pull requests in the target repo. Please do mention this thread's URL in the issues so we can keep tracking what is going on out of our world.

Please do not hesitate to ask me about anything, anytime.

Thank you in advance!

gmrandazzo commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

mikeaalv commented 1 year ago

Review checklist for @mikeaalv

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/gmrandazzo/libscientific?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@gmrandazzo) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

mikeaalv commented 1 year ago

Better description on installation is needed (https://github.com/gmrandazzo/libscientific/issues/5)

mikeaalv commented 1 year ago

Hi @gmrandazzo

Thank you for providing this useful toolbox for C programmers. libscientific contains many useful tools for data analysis, particularly in chemoinformatics and metabolomics. Providing the interface to multivariate analysis in C can be helpful to build efficient pipelines. I have a few comments about the package:

It seems that the author starts to work on tests but no (automatic) tests are available in the repository. Even though a major part of the code is statistics-related, tests are still possible and necessary to ensure the quality for now and future. For major functions, I suggest adding tests including tests on simple toy examples and tests of invariant properties. Examples can be found in the repository of scikit-learn.
Coding in C doesn't remove the need of comparing performance with other tools (in C or not, e.g., python). Speed (or even memory) comparison with regular tools is needed in the manuscript.
I suggest adding documents for important functions in C and Python, in addition to examples in the docs and annotation in codes. Adding at least one C example (within an example folder) that the user can directly compile and run can also be helpful.
More background is needed. I suggest mentioning and citing previous libraries (particularly in C) that can cover similar functions or part of the functions in the Statement of need. I also suggest adding a brief description of application cases in Multivariate analysis algorithms specs, so that the reader understands the usage case.
Usage examples. The example "Sampling example on a drug dataset" ran after I add time.sleep. Having the data in other storage places might help. Both examples will benefit by having more descriptions of the chemical questions in the beginning so that the reader can understand the background.
Please add more details to Contributing (e.g., 2nd step). Adding a test folder to collect automatic tests can help. Adding more information related to the support and issue page can help.
Grammar problems. "Whether a scientists work on research or data analytics, libscientific can help gain deeper insights into the data"

faosorios commented 1 year ago

Review checklist for @faosorios

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/gmrandazzo/libscientific?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@gmrandazzo) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

jbytecode commented 1 year ago

@gmrandazzo - Could you please add/update/change the repository and the manuscript as our reviewer suggests? Please ping us when you have done so we can take a loot at the changes and review again.

Thank you in advance.

jbytecode commented 1 year ago

@gmrandazzo - could you please update your status?

gmrandazzo commented 1 year ago

@gmrandazzo - could you please update your status?

Dear all I will reply and update everything by the end of next week.

Thank you Bests

faosorios commented 1 year ago

Hi @gmrandazzo

Your manuscript presents a C library that provides a set of routines for performing multivariate statistical analysis, with particular emphasis on methods popular in chemometrics. In addition to the library, Python bindings are available, which allow the use by a wider community of analysts. In my opinion, this kind of projects are necessary and I hope that it will be consolidated by incorporating a greater number of statistical procedures. I have reviewed the code and installed the library on my personal computer. My comments focus on the implementation of some techniques that could be improved. In what follows, I provide some comments that the author may hopefully find useful.

The routine for OLS (available in the files algebra.c and mlr.c) is based on forming the cross product matrices Z^t Z and Z^t Y and inverting the matrix Z^t Z by using Gauss-Jordan elimination, which is not a recommended for computations in LS problems. Recommended procedures for LS are based in the use of QR, Cholesky or SVD decompositions.
The routines for PCA are based on the NIPALS algorithm, possibly motivated by the popular implementation in the area of chemometrics. I am not sure why this choice was made, instead of the more traditional SVD-based option.
The routines for Euclidean distance as well as MSE computation can suffer from overflow/underflow. A better alternative is to use the algorithm proposed by Blue (1978) ACM Transactions on Mathematical Software 4, 15-23, or the DLASSQ routine from LAPACK.
The R-squared calculation in OLS is based on the formula 1 - RSS / SST, with RSS and SST denoting the sum of residual and total squares, respectively. That formula is appropriate for models with intercept which may not be the case. Another way to calculate R2 is to simply obtain the square of the correlation between the vector of responses and the vector of predicted values. It seems to me that a good option for the routine would be to check for the presence of an intercept.
Libscientific has LAPACK/BLAS as a requirement so it is strange from my perspective that it does not make more intensive use of the methods available in those libraries. It would be interesting to incorporate options into the code to allow the use of other calculation procedures such as those indicated in points 1, 2 and 3.
The above are just a few areas where implementation could be improved, I think that the manuscript could be improved by including such ideas as an opportunity for future development.

gmrandazzo commented 1 year ago

Dear @faosorios thank you for your valuable feedback that I acknowledge and start working on it. Best regards Marco

jbytecode commented 1 year ago

@gmrandazzo - Could you please update your status and inform us on how is your work going? Thank you in advance.

jbytecode commented 1 year ago

@gmrandazzo - We would be very happy if you could provide an update on the situation. Thank you in advance.

jbytecode commented 11 months ago

@gmrandazzo - Would you like me to set this issue to paused ?

gmrandazzo commented 11 months ago

Hello @jbytecode, I'm progressing.. quite finishing all the work to do... I still have to answer a couple of questions, and I will finish before the end of this month... after that, I will not have much time! So please let it still be under review. If I have not answered my two reviewers before the end of this month, please put it on pause. Thank you

jbytecode commented 11 months ago

@gmrandazzo - Thank you for the quick response, okay, since you are under high load of work, we can keep the issue as it is.

gmrandazzo commented 11 months ago

@editorialbot generate pdf

editorialbot commented 11 months ago

:warning: An error happened when generating the pdf.

jbytecode commented 11 months ago

It seems the error is about png files as the error message includes libpng error: Not a PNG file. You can correct the error and try it on the page renderer: https://whedon.theoj.org/

Looks like we failed to compile the PDF with the following error: [WARNING] Could not convert image '/tmp/tex2pdf.-bcdd8186e33cd87e/d1dea6e8fa6bd5a33ea8b238363f981a98de6663.shtml': Cannot load file Jpeg Invalid marker used PNG Invalid PNG file, signature broken Bitmap Invalid Bitmap magic identifier GIF Invalid Gif signature : HDR Invalid radiance file signature Tiff Invalid endian tag value TGA Invalid bit depth (104) [WARNING] Could not convert image '/tmp/tex2pdf.-

gmrandazzo commented 11 months ago

It seems the error is about png files as the error message includes libpng error: Not a PNG file. You can correct the error and try it on the page renderer: https://whedon.theoj.org/

Looks like we failed to compile the PDF with the following error: [WARNING] Could not convert image '/tmp/tex2pdf.-bcdd8186e33cd87e/d1dea6e8fa6bd5a33ea8b238363f981a98de6663.shtml': Cannot load file Jpeg Invalid marker used PNG Invalid PNG file, signature broken Bitmap Invalid Bitmap magic identifier GIF Invalid Gif signature : HDR Invalid radiance file signature Tiff Invalid endian tag value TGA Invalid bit depth (104) [WARNING] Could not convert image '/tmp/tex2pdf.-

Awesome!

gmrandazzo commented 11 months ago

@editorialbot generate pdf

editorialbot commented 11 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 11 months ago

@gmrandazzo - Could you please provide us a summary of changes so our reviewers @mikeaalv and @faosorios can track what is changed and they can go on with the further review.

gmrandazzo commented 11 months ago

Hi @gmrandazzo

Thank you for providing this useful toolbox for C programmers. libscientific contains many useful tools for data analysis, particularly in chemoinformatics and metabolomics. Providing the interface to multivariate analysis in C can be helpful to build efficient pipelines. I have a few comments about the package:

Hi @mikeaalv

Thank you for your time and consideration.

1. It seems that the author starts to work on tests but no (automatic) tests are available in the repository. Even though a major part of the code is statistics-related, tests are still possible and necessary to ensure the quality for now and future. For major functions, I suggest adding tests including tests on simple toy examples and tests of invariant properties. Examples can be found in the repository of scikit-learn.

Thank you for tackling this point, which is crucial for me too. Simple and also "manually validated" toy tests are present and automatic. To perform the test, you need first to compile the source code and then run "ctest". With that command, all the tests will be automatically triggered, and every test will be checked internally to be consistent as user-defined. Moreover, additional tests have been also developed for the python package. Since this part was not clear now, I have updated the "README" (Manual installation) and the paper itself to explain more about the unit tests technology by adding a new section named "Algorithm stability"

To give you more information, every test checks manually from a simple matrix-matrix multiplication to the final correctness of a complex algorithm like PCA/PLS/CPCA/etc. For that, I have created different tests for every method to check if the algorithm is consistent, reproducible, and correct in terms of its results. For example, matrix-matrix multiplication or matrix-vector and vice versa are checked manually with artificial tests and tested with a "large matrix" to check for memory leak and speed. PCA/PLS and so on are verified by looking at their "math" conception. For example, for PCA there is a test wich verify that given X aftear a PCA you get T (scores) and P (loadings). By multiplying T*P you should be able to fully reconstruct the original matrix X. If this is not "meet" then there is something wrong in the algorithm. Same story for PLS, consensus PCA, unfolding PCA (UPCA), etc.

2. Coding in C doesn't remove the need of comparing performance with other tools (in C or not, e.g., python). Speed (or even memory) comparison with regular tools is needed in the manuscript.

Since this library is a collection of well established and know multivariate algorithms "performance" comparison is not really necessary. Despite that, to answer your question, I have added a new section to the paper "# Speed and Memory Comparison" that discuss that point for PCA/CPCA/PLS/MLR in wich plotting the cpu time vs the input size show a linear trend. This means tha the algorithm complexity are linear as expected and the linear behavior is also a positive sign in terms of performance, indicating that the algorithms scale well with larger input size. Last but not least, to avoid misunderstanding, the world "performant" has been changed to "efficient".

3. I suggest adding documents for important functions in C and Python, in addition to examples in the docs and annotation in codes. Adding at least one C example (within an example folder) that the user can directly compile and run can also be helpful.

The official documentation has been updated. Examples are already present in the documentation. A detailed description of compiling and running the code is then provided. An important amount of documentation also for the python code has been also released.

4. More background is needed. I suggest mentioning and citing previous libraries (particularly in C) that can cover similar functions or part of the functions in the Statement of need. I also suggest adding a brief description of application cases in Multivariate analysis algorithms specs, so that the reader understands the usage case.

The library is specifically directed to any kind of tabular data that needs dimensionality reduction, predictive modeling, quality control and so on. To avoid misunderstanding and unwanted sectorialization of the library, I have modified the statement of need by adding the following sentence "Libscientific was designed to analyze any kind of multivariate tabular data." and the conclusion by adding the following sentence: "Incorporating Libscientific into analytical workflows may empower professionals to leverage various multivariate techniques to crack complex relationships and patterns within datasets. By offering tools for data reduction, predictive modeling, quality control, and more, as already demonstrated in previous works in -omics science and predictive modeling[@Randazzo16;@Randazzo171;@Randazzo172;@Randazzo20;@Kwon21;@Kwon22], the library can be an indispensable asset for tackling intricate challenges across various disciplines."

5. Usage examples. The example "Sampling example on a drug dataset" ran after I add `time.sleep`. Having the data in other storage places might help. Both examples will benefit by having more descriptions of the chemical questions in the beginning so that the reader can understand the background.

Thank you for the advice. However the aim of this example is only to show the usage of the library and not the chemical example itself. To address the misunderstanding I will provide new similar examples

6. Please add more details to Contributing (e.g., 2nd step). Adding a test folder to collect automatic tests can help. Adding more information related to the support and issue page can help.

Done. You can find more details on how to contribute.

7. Grammar problems. "Whether a scientists work on research or data analytics, libscientific can help gain deeper insights into the data"

Fixed

gmrandazzo commented 11 months ago

Dear @faosorios thank you!

Your manuscript presents a C library that provides a set of routines for performing multivariate statistical analysis, with particular emphasis on methods popular in chemometrics. In addition to the library, Python bindings are available, which allow the use by a wider community of analysts. In my opinion, this kind of projects are necessary and I hope that it will be consolidated by incorporating a greater number of statistical procedures. I have reviewed the code and installed the library on my personal computer. My comments focus on the implementation of some techniques that could be improved. In what follows, I provide some comments that the author may hopefully find useful.

Thank you for your thoughtful comments on our manuscript. I appreciate your positive feedback regarding the C library we developed for multivariate statistical analysis, specifically chemometrics. I agree that projects like this can significantly benefit the analytical community, and I'm glad you find it necessary. I have carefully considered your specific comments and strive to address them appropriately. Your input plays a crucial role in shaping the library, and I am committed to incorporating more statistical procedures to improve and expand its scope and utility.

1) The routine for OLS (available in the files algebra.c and mlr.c) is based on forming the cross product matrices Z^t Z and Z^t Y and inverting the matrix Z^t Z by using Gauss-Jordan elimination, which is not a recommended for computations in LS problems. Recommended procedures for LS are based in the use of QR, Cholesky or SVD decompositions.

Thank you for your insightful comment. I appreciate your considerations about using QR triangularization or SVD decomposition for the procedures under discussion.

In choosing to maintain the use of Gauss-Jordan elimination, I took into account several factors that align with the specific goals and characteristics of the task at hand: a) The Gauss-Jordan elimination method offers the advantage of directly solving the system of linear equations, providing a solution to the least squares problem in a single step. This efficiency is particularly beneficial in specific contexts.

b) Stability is a critical aspect when dealing with numerical methods. The Gauss-Jordan elimination method offers stability, mainly when working with positively defined or diagonal dominant matrices. The reference you provided (doi:10.1145/360569.360653) supports this stability claim.

c) Handling small-pivot values is a challenge in numerical methods. My investigation revealed that the Gauss-Jordan elimination method exhibits more stability when encountering small-pivot values than QR/LU/Cholesky decompositions. This is crucial in mitigating potential numerical instability and accuracy concerns.

d) Another aspect favoring the Gauss-Jordan elimination is its natural ability to provide the inverse of a matrix. This is a valuable feature in specific scenarios.

e) Your reference to the paper (doi:10.1145/360569.360653) raises an interesting point about ill-conditioned matrices. Despite potentially yielding larger residuals, Gauss-Jordan elimination maintains a certain level of consistency compared to other methods.

Given the primary objective of obtaining solutions for a system of equations with a full square matrix, I found that while the Gauss-Jordan elimination method involves numerous row operations and can be computationally expensive for larger matrices, it aligns well with the characteristics and goals of the task. The acceptable level of error in the computed solution, as highlighted, is of similar magnitude to the solution itself.

I hope this provides further context for my decision to maintain the current implementation. I understand that different methods have their merits, and this choice best serves the specific requirements of our project.

2) The routines for PCA are based on the NIPALS algorithm, possibly motivated by the popular implementation in the area of chemometrics. I am not sure why this choice was made, instead of the more traditional SVD-based option.

Yes. The implementation of the PCA routine was mainly motivaded by this chemometric popular implementation. However, the other reason is that the NIPALS algorithm, compared to the SVD based one allows to compute the first "n" user decided components. Instead the SVD needs to compute all the pc. Last but not least, in this NIPALS implementation, missing values are threated directly in the algorithm by skipping them during the computation providing thus to do calculation of PCA with missing values. This feature is available in libscientific.

3) The routines for Euclidean distance as well as MSE computation can suffer from overflow/underflow. A better alternative is to use the algorithm proposed by Blue (1978) ACM Transactions on Mathematical Software 4, 15-23, or the DLASSQ routine from LAPACK.

Thank you for pointing out this out. I will address this point in the next release. I have opened an issue. Thank you!

4) The R-squared calculation in OLS is based on the formula 1 - RSS / SST, with RSS and SST denoting the sum of residual and total squares, respectively. That formula is appropriate for models with intercept which may not be the case. Another way to calculate R2 is to simply obtain the square of the correlation between the vector of responses and the vector of predicted values. It seems to me that a good option for the routine would be to check for the presence of an intercept.

The two different approach of calculating R2 is one in fitting and one over the predicted values. For that reason there is a score named "bias" wich try to check the presence of an intercept and how far is from the perfect solution. I would like to mantain the R2 like this and have instead another R2 method that implement your proposal. Issue is now open.

5) Libscientific has LAPACK/BLAS as a requirement so it is strange from my perspective that it does not make more intensive use of the methods available in those libraries. It would be interesting to incorporate options into the code to allow the use of other calculation procedures such as those indicated in points 1, 2 and 3.

The plan is also to make an extent use of lapack library. This will be included in the future releases. Thank you!

The above are just a few areas where implementation could be improved, I think that the manuscript could be improved by including such ideas as an opportunity for future development.

Again, many thanks!

jbytecode commented 11 months ago

@mikeaalv, @faosorios - Could you please review the latest changes and update your status? Thank you in advance.

mikeaalv commented 11 months ago

Updated my checklist. I'm happy. No major comments.

faosorios commented 11 months ago

The author has answered my comments in a satisfactory manner. This is a serious revision, and it lead to an improvement of the paper. In my opinion it can be accepted.

gmrandazzo commented 11 months ago

Thank you, @faosorios and @mikeaalv, for your time and consideration! I have appreciated your comments and interactions!

jbytecode commented 11 months ago

@mikeaalv - Have you finished your review? It seems there is one more unchecked task item, did you forget to check it? Please finalize your review. Thank you in advance.

mikeaalv commented 11 months ago

yes. I finished. I didn't realize that I need to check off all tasks. Just check it off. Thank you.

jbytecode commented 11 months ago

@editorialbot check references

editorialbot commented 11 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/0003-2670(86)80028-9 is OK
- 10.1002/cem.1180010107 is OK
- 10.1002/(SICI)1099-128X(199809/10)12:5<301::AID-CEM515>3.0.CO;2-S is OK
- 10.1088/0957-0233/12/10/708 is OK
- 10.1002/qsar.19960150402 is OK
- 10.1177/108705719600100308 is OK
- 10.1016/j.aca.2016.02.014 is OK
- 10.1016/j.chemolab.2016.11.010 is OK
- 10.1016/j.jchromb.2017.04.032 is OK
- 10.1016/j.chroma.2019.460661 is OK
- 10.1126/sciadv.abf2665 is OK
- 10.3389/fcimb.2022.897291 is OK

MISSING DOIs

- None

INVALID DOIs

- None

jbytecode commented 11 months ago

@editorialbot generate pdf

editorialbot commented 11 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 11 months ago

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

[ ] Double check authors and affiliations (including ORCIDs)
[ ] Make a release of the software with the latest changes from the review and post the version number here. This is the version that will be used in the JOSS paper.
[ ] Archive the release on Zenodo/figshare/etc and post the DOI here.
[ ] Make sure that the title and author list (including ORCIDs) in the archive match those in the JOSS paper.
[ ] Make sure that the license listed for the archive is the same as the software license.

Editor Tasks Prior to Acceptance

[x] Read the text of the paper and offer comments/corrections (as either a list or a PR)
[x] Check the references in the paper for corrections (e.g. capitalization)
[x] Check that the archive title, author list, version tag, and the license are correct
[x] Set archive DOI with @editorialbot set <DOI here> as archive
[x] Set version with @editorialbot set <version here> as version
[x] Double check rendering of paper with @editorialbot generate pdf
[x] Specifically check the references with @editorialbot check references and ask author(s) to update as needed
[ ] Recommend acceptance with @editorialbot recommend-accept

jbytecode commented 11 months ago

@gmrandazzo - Here we go, we are now ready to editorial stuff.

First, I've just sent a pull request that includes minor changes in both paper manuscript and bibliography.

PR: https://github.com/gmrandazzo/libscientific/pull/10

Please review the changes and apply them if they seem to be okay.

Second, I can not see a part of comparison of the Libscientific with the other libraries. Please add a State of the Field or add a paragraph that compares Libscientific with the other software in somewhere else (of course, including new citations). Please be clear why somebody use or prefer Libscientific and not the others.

gmrandazzo commented 11 months ago

Hi @jbytecode

PR accepted.

I have added a state of the field paragraph in the article explaining why libscientific should be preferred and not others. The main reason is that the NIPALS algorithm, in general in PLS/CPCA/PLS/and so on, allows to work with missing data without needing data imputation. Instead, if you look at the popular scikit-learn PCA or PLS, these methods use a different approach, which does not allow you to work with missing data, and you need to do a data imputation before calculation. To justify that, I have added two more references: LittleRubin1987 and Qifa2005.

Thank you

gmrandazzo commented 11 months ago

@editorialbot generate pdf

editorialbot commented 11 months ago

:warning: An error happened when generating the pdf.

gmrandazzo commented 11 months ago

@editorialbot generate pdf

editorialbot commented 11 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

gmrandazzo commented 11 months ago

@editorialbot check references

editorialbot commented 11 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/0003-2670(86)80028-9 is OK
- 10.1002/cem.1180010107 is OK
- 10.1002/(SICI)1099-128X(199809/10)12:5<301::AID-CEM515>3.0.CO;2-S is OK
- 10.1088/0957-0233/12/10/708 is OK
- 10.1002/9781119013563 is OK
- 10.1109/CVPR.2005.309 is OK
- 10.1002/qsar.19960150402 is OK
- 10.1177/108705719600100308 is OK
- 10.1016/j.aca.2016.02.014 is OK
- 10.1016/j.chemolab.2016.11.010 is OK
- 10.1016/j.jchromb.2017.04.032 is OK
- 10.1016/j.chroma.2019.460661 is OK
- 10.1126/sciadv.abf2665 is OK
- 10.3389/fcimb.2022.897291 is OK

MISSING DOIs

- None

INVALID DOIs

- None

openjournals / joss-reviews

[REVIEW]: Libscientific: A Powerful C Library for Multivariate Analysis #5420

Status

Reviewer instructions & questions

Checklists

Review checklist for @mikeaalv

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @faosorios

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

Editor Tasks Prior to Acceptance