Closed SebKrantz closed 1 year ago
Hi @SebKrantz, thanks for your inquiry.
I've checked with some of the other editors. We flexibility around input checking in low-level R or C code intended for export in other packages, but this package appears have a high-level interface and so that interface should be standards-compliant.
Additionally, some of us have had difficulties installing the package as it currently is written. @mpadge had difficulties installing due to "undefined symbol: dpocon_". Error perfectly repeatable in a rocker/tidyverse
container. However, I am able to build it on my M1 MacBook with no errors. This should be addressed before we can proceed.
Hi @adamhsparks. Thanks for these initial comments. Let me work a bit further on it still to ensure the package passes your checks as good as possible and we can discuss further about it. What I can definitely offer to do it rigorous checking of inputs at the C++ level in the Kalman Filtering functions. I've not done this so far because Armadillo already has build in checks and throws appropriate error messages, making it practically impossible to pass something wrong. But let me see if I can appease your tests.
Regarding the installation issue. I'm sorry for that, it turned out that BLAS and LAPACK libraries were linked through a Makevars.win file, instead of a global Makevars file. I have fixed this now.
I have now brought dfms to a state where I am happy with it. The software is quite robust and inputs are rigorously checked. I still get some autotest issues, particularly in the main DFM()
function, but I don't understand those, as all inputs parameters received the full extent of checking, including for data type, case insensitivity and permissible range.
I have quickly gone through the srr standards, and in my opinion dfms broadly meets most of them. I have made some comments in srr-stats-standards.R
below standards where I have done things a bit differently.
Regarding unit testing, there is room for much improvement, in particular one could set up R translations of the authors original Matlab codes (which are provided under misc/) to test against. For now I have manually verified equivalence of dfms to these codes, and test against some hard-coded parameter values.
I general, my time on this package is a bit limited due to more important development commitments that I have, but it would be nice to get some review, and also to be able to release it to CRAN before Christmas. Let me know how you think about this.
Hi @SebKrantz, good to hear that you've managed to improve the package.
There are no requirements for the package review and CRAN. That is, you can put it on CRAN and have it reviewed asynchronously, they don't affect each other.
We can initiate the review process as soon as you can open a new issue requesting the review. We would strive to have the first reviews done within three weeks of the reviewers being assigned. This of course is flexible to work with the reviewers' own commitments as well. So we might have something by Christmas for the reviews for you, but I'd not be surprised if it slipped a little past that.
Thanks @adamhsparks for the clarification. I will then open an issue requesting review, and also prepare a first CRAN release.
Submitting Author Name: Sebastian Krantz Submitting Author Github Handle: !--author1-->@SebKrantz<!--end-author1-- Other Package Authors Github handles: !--author-others-->@rbagd<!--end-author-others-- Repository: https://github.com/SebKrantz/dfms Submission type: Pre-submission Language: en
Scope
Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):
Data Lifecycle Packages
[ ] data retrieval
[ ] data extraction
[ ] data munging
[ ] data deposition
[ ] data validation and testing
[ ] workflow automation
[ ] version control
[ ] citation management and bibliometrics
[ ] scientific software wrappers
[ ] field and lab reproducibility tools
[ ] database software bindings
[ ] geospatial data
[ ] text analysis
Statistical Packages
[ ] Bayesian and Monte Carlo Routines
[x] Dimensionality Reduction, Clustering, and Unsupervised Learning
[ ] Machine Learning
[ ] Regression and Supervised Learning
[ ] Exploratory Data Analysis (EDA) and Summary Statistics
[ ] Spatial Analyses
[x] Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
Dynamic factor models are a time series modelling and dimensionality reduction technique.
I'm working on this.
Anybody working with time series. The package is useful for dimensionality reduction and forecasting with a large amount of time series.
See README.md. In short: dfms is the much faster, provides multiple estimation methods, and has a comprehensive set of methods for exploring the model and forecasting. It is less specialized than economic nowcasting packages.
No Applicable.
First, I would like to ask if you think you'll be able to review this package in a statistical sense. Then, I will likely not be able to comply with all of your standards, as I intend to export some C++ level helper function (mainly efficient Kalman Filtering and Smoothing functions) without any checks on the inputs. My hope here is in part to provide infrastructure that more specialized software (such as nowcasting packages) can take advantage of. The iterative filtering and smoothing performed in the estimation of dynamic factor models via expectation maximization (EM) algorithms does not square well with R-level checks in those functions (which would be executed many times per second).