dfms: Efficient Estimation of Dynamic Factor Models for R

SebKrantz commented 1 year ago

Submitting Author Name: Sebastian Krantz Submitting Author Github Handle: !--author1-->@SebKrantz@rbagd@noamross<!--end-editor-- Reviewers: @eeholmes, @santikka

Due date for @eeholmes: 2023-01-03 Due date for @santikka: 2023-01-04

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: dfms
Version: 0.1.2
Title: Dynamic Factor Models
Authors@R: c(person("Sebastian", "Krantz", role = c("aut", "cre"), email = "sebastian.krantz@graduateinstitute.ch"),
             person("Rytis", "Bagdziunas", role = "aut"))
Description: Efficient estimation of Dynamic Factor Models using the Expectation Maximization (EM) algorithm 
  or Two-Step (2S) estimation, on datasets with missing data. The implementation follows advances in the econometric 
  literature: estimation can be done either by running the Kalman Filter and Smoother once with initial values 
  from PCA - following Doz, Giannone and Reichlin (2011) (2S) - or via iterated Kalman Filtering and Smoothing until EM 
  convergence - following Doz, Giannone and Reichlin (2012) - or using the adapted EM algorithm of Banbura and Modugno 
  (2014), allowing estimation with arbitrary patterns of missing data. The implementation makes heavy use of the 
  Armadillo C++ library and the collapse package, providing for particularly speedy estimation. A comprehensive set of 
  methods supports interpretation/visualization of the model and forecasting. Information criteria to choose the number 
  of factors are also provided - following Bai and Ng (2002).
  --- Key References: ---
  Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic 
       factor models based on Kalman filtering. Journal of Econometrics, 164(1), 188-205.
  Doz, C., Giannone, D., & Reichlin, L. (2012). A quasi-maximum likelihood approach for large, approximate 
       dynamic factor models. Review of Economics and Statistics, 94(4), 1014-1024.
  Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary 
       pattern of missing data. Journal of Applied Econometrics, 29(1), 133-160.
URL: https://sebkrantz.github.io/dfms/
BugReports: https://github.com/SebKrantz/dfms/issues
Depends: R (>= 3.0.0)
Imports: Rcpp (>= 1.0.1), collapse (>= 1.8.0)
LinkingTo: Rcpp, RcppArmadillo
Suggests: 
    xts,
    vars,
    magrittr,
    testthat (>= 3.0.0),
    knitr,
    rmarkdown,
    covr
License: GPL-3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.1.2
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

Please indicate which of our statistical package categories this package falls under. (Please check one appropriate box below):

Statistical Packages
- [ ] Bayesian and Monte Carlo Routines
- [x] Dimensionality Reduction, Clustering, and Unsupervised Learning
- [ ] Machine Learning
- [ ] Regression and Supervised Learning
- [ ] Exploratory Data Analysis (EDA) and Summary Statistics
- [ ] Spatial Analyses
- [x] Time Series Analyses

Pre-submission Inquiry

[x] A pre-submission inquiry has been approved in issue #555

General Information

Who is the target audience and what are scientific applications of this package?

Anybody working with time series. The package is useful for dimensionality reduction and forecasting with a large amount of time series.

Paste your responses to our General Standard G1.1 here, describing whether your software is:
- The first implementation of a novel algorithm; or
- The first implementation within R of an algorithm which has previously been implemented in other languages or contexts; or
- An improvement on other implementations of similar algorithms in R.

See README.md, dfms implements simple baseline versions of algorithms that have been around for a while in Matlab, and in other langaues (R, Python, Julia), but inside more elaborate nowcasting codes - thus not directly accessible, and less efficient. It is the only pure baseline implementation of the algorithms proposed by the 3 academic references mentioned in the description available for R and ready for CRAN.

Please include hyperlinked references to all other relevant software.

The software is actually a reboot and massive improvement upon dynfactoR, an abandoned software project. Generalizations of the functionality are provided by nowcasting and nowcastDFM, which fit dynamic factor models specific to mixed-frequency nowcasting applications. These packages are currently not on CRAN (they were archived) and also not very well maintained. Package MARSS can be used to fit dynamic factor models, but has a complicated API and fails on bigger datasets. The only really useful and well maintained dynamic factor modelling package for R is bayesdfa, which is also on CRAN, and fits bayesian dynamic factor models with Stan. I expect dfms to provide substantially faster estimation than bayesdfa. There are various other codes for Python and Julia on GitHub, including an implementation in the popular statsmodels library, but I did not engage with those as my primary tool remains R and I wanted to create an efficient baseline implementation for R that follows advances in the econometrics literature (PCA + EM Algorithm based estimation).

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Not applicable.

Badging

What grade of badge are you aiming for? (bronze, silver, gold)

Bronze

If aiming for silver or gold, describe which of the four aspects listed in the Guide for Authors chapter the package fulfils (at least one aspect for silver; three for gold)

Technical checks

Confirm each of the following by checking the box.

[x] I have read the rOpenSci packaging guide.
[x] I have read the author guide and I expect to maintain this package for at least 2 years or have another maintainer identified.
[x] I/we have read the Statistical Software Peer Review Guide for Authors.
[x] I/we have run autotest checks on the package, and ensured no tests fail.
[x] The srr_stats_pre_submit() function confirms this package may be submitted.
[x] The pkgcheck() function confirms this package may be submitted - alternatively, please explain reasons for any checks which your package is unable to pass.

There are still some autotest issues, especially for the main DFM() function, but I do not understand those as all inputs received the maximum extent of checking. See lines 211-226. I also don't understand the note in pkgcheck requesting CI checks. The package receives CI through GitHub Actions (all plattforms) and test coverage is uploaded to codecov.io.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 1 year ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Checks for dfms (v0.1.2)

git hash: 52706666

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 76.6%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.
:eyes: Function names are duplicated in other packages

(Checks marked with :eyes: may be optionally addressed.)

Package License: GPL-3

1. rOpenSci Statistical Standards (`srr` package)

This package is in the following category:

Time Series
Dimensionality Reduction, Clustering and Unsupervised Learning

:heavy_check_mark: All applicable standards [v0.1.0] have been documented in this package (147 complied with; 9 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.

2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:-------------|------:| |internal |base | 726| |internal |stats | 83| |internal |dfms | 51| |internal |graphics | 10| |internal |grDevices | 1| |imports |collapse | 75| |imports |Rcpp | NA| |suggests |xts | NA| |suggests |vars | NA| |suggests |magrittr | NA| |suggests |testthat | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |covr | NA| |linking_to |Rcpp | NA| |linking_to |RcppArmadillo | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

list (120), c (69), T (55), rep (48), if (45), dim (40), F (24), levels (22), drop (19), return (18), length (14), tolower (14), switch (13), t (10), beta (9), matrix (9), ncol (9), nrow (9), paste0 (9), crossprod (8), lapply (8), tcrossprod (8), diag (7), for (7), unlist (7), attr (6), gamma (6), abs (5), apply (5), attributes (5), col (5), seq_len (5), cbind (4), dimnames (4), is.na (4), log (4), rbind (4), rowSums (4), stop (4), which (4), anyNA (3), call (3), colSums (3), cumsum (3), eigen (3), sum (3), do.call (2), identity (2), is.null (2), match.call (2), min (2), names (2), quote (2), replace (2), rev (2), rowMeans (2), structure (2), any (1), as.integer (1), as.vector (1), environment (1), is.character (1), is.finite (1), is.list (1), is.matrix (1), isTRUE (1), kronecker (1), numeric (1), nzchar (1), outer (1), paste (1), prod (1), solve.default (1)

stats

C (36), time (18), weights (8), filter (6), setNames (6), cov (4), ts.plot (4), residuals (1)

collapse

setAttrib (26), setDimnames (13), mctl (10), frange (5), setColnames (5), qsu (3), t_list (3), unattrib (3), pwcov (2), setop (2), fmedian (1), fvar (1), whichv (1)

dfms

ainv (6), ftail (5), SKFS (5), lagnam (4), msum (4), AC1 (3), apinv (2), EMstepBMOPT (2), Estep (2), FIS (2), SKF (2), as.data.frame.dfm (1), as.data.frame.dfm_forecast (1), DFM (1), em_converged (1), EMstepDGR (1), findNA_LE (1), fitted.dfm (1), ICr (1), impNA_MA (1), impNA_spline (1), plot.dfm (1), plot.ICr (1), tsnarmimp (1), unscale (1)

graphics

par (8), lines (2)

grDevices

rainbow (1)

**NOTE:** Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.

3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in C++ (30% in 5 files) and R (70% in 9 files) - 2 authors - 1 vignette - 3 internal data files - 2 imported packages - 30 exported functions (median 13 lines of code) - 57 non-exported functions in R (median 8 lines of code) - 13 R functions (median 15 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|------:|----------:|:----------| |files_R | 9| 55.2| | |files_src | 5| 88.8| | |files_vignettes | 1| 68.4| | |files_tests | 3| 75.2| | |loc_R | 844| 63.3| | |loc_src | 364| 39.8| | |loc_vignettes | 120| 31.3| | |loc_tests | 87| 36.5| | |num_vignettes | 1| 64.8| | |data_size_total | 117315| 84.2| | |data_size_median | 7049| 76.8| | |n_fns_r | 87| 72.8| | |n_fns_r_exported | 30| 78.3| | |n_fns_r_not_exported | 57| 71.1| | |n_fns_src | 13| 35.0| | |n_fns_per_file_r | 6| 72.3| | |n_fns_per_file_src | 3| 34.0| | |num_params_per_fn | 5| 69.6| | |loc_per_fn_r | 8| 20.0| | |loc_per_fn_r_exp | 13| 30.5| | |loc_per_fn_r_not_exp | 8| 22.6| | |loc_per_fn_src | 15| 51.6| | |rel_whitespace_R | 17| 62.2| | |rel_whitespace_src | 25| 47.8| | |rel_whitespace_vignettes | 51| 45.9| | |rel_whitespace_tests | 38| 47.0| | |doclines_per_fn_exp | 48| 60.6| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 39| 61.0| | ---

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

4. `goodpractice` and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/SebKrantz/dfms/workflows/R-CMD-check/badge.svg)](https://github.com/SebKrantz/dfms/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 3204988595|pages build and deployment |success |06193d | 28|2022-10-07 | | 3204988635|R-CMD-check |failure |06193d | 100|2022-10-07 | | 3204988634|test-coverage |success |06193d | 29|2022-10-07 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following notes: 1. checking installed package size ... NOTE installed size is 9.2Mb sub-directories of 1Mb or more: doc 1.5Mb libs 7.4Mb 2. checking dependencies in R code ... NOTE Namespace in Imports field not imported from: ‘Rcpp’ All declared Imports should be used. R CMD check generated the following check_fails: 1. cyclocomp 2. rcmdcheck_imports_not_imported_from 3. rcmdcheck_reasonable_installed_size #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 76.57 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- DFM | 55 plot.dfm_forecast | 27 plot.dfm | 25 as.data.frame.dfm | 19 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 551 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 3 Lines should not be more than 80 characters. | 491 Use <-, not =, for assignment. | 57

5. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following function name is duplicated in other packages: - - `DFM` from MKclass

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.1.1.54 | |pkgcheck |0.1.0.24 | |srr |0.0.1.180 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

adamhsparks commented 1 year ago

Thanks, @SebKrantz. I'll find a handling editor for you shortly.

SebKrantz commented 1 year ago

Thanks a lot @adamhsparks. I'm also pending a CRAN submission now, following your earlier indication that this does not conflict with the review. I will maintain a note at the top of the README.md file stating the the package is under review and that review may result in API changes.

adamhsparks commented 1 year ago

@ropensci-review-bot assign @rkillick as editor

ropensci-review-bot commented 1 year ago

Assigned! @rkillick is now the editor

noamross commented 1 year ago

@ropensci-review-bot assign @noamross as editor

ropensci-review-bot commented 1 year ago

Assigned! @noamross is now the editor

noamross commented 1 year ago

Hello @SebKrantz, I wanted to apologize for the delay on this. One of our editors has needed to step back for a bit so I'll be taking over and seeking reviewers.

noamross commented 1 year ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 1 year ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/556_status.svg)](https://github.com/ropensci/software-review/issues/556)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

SebKrantz commented 1 year ago

Hi @noamross, no problem. I added the review badge.

noamross commented 1 year ago

@ropensci-review-bot assign @eeholmes as reviewer

ropensci-review-bot commented 1 year ago

@eeholmes added to the reviewers list. Review due date is 2023-01-03. Thanks @eeholmes for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

noamross commented 1 year ago

@ropensci-review-bot assign @santikka as reviewer

ropensci-review-bot commented 1 year ago

@santikka added to the reviewers list. Review due date is 2023-01-04. Thanks @santikka for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 1 year ago

:calendar: @eeholmes you have 2 days left before the due date for your review (2023-01-03).

ropensci-review-bot commented 1 year ago

:calendar: @santikka you have 2 days left before the due date for your review (2023-01-04).

santikka commented 1 year ago

Package Review

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Compliance with Standards

[ ] This package complies with a sufficient number of standards for a (bronze/silver/gold) badge
[ ] This grade of badge is the same as what the authors wanted to achieve

The following standards currently deemed non-applicable (through tags of @srrstatsNA) could potentially be applied to future versions of this software:

I'm currently unable to assess how well this package conforms to the standards for two reasons. First, all of the @srrstats tags are still included in the srr-stats-standards.R file and are not placed in appropriate places in the code. Second, the compliance of only a handful of the standards has been explained.

General Review

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING
[x] The documentation is sufficient to enable general use of the package beyond one specific use case

Algorithms

As I understand it, the one of the main advantages of this package is computational speed, making C++ a valid choice for implementing the core algorithms. I would like to see some tests to substantiate the performance claims of this package vs. packages such as MARSS and bayesdfa that were mentioned by the authors.

Testing

While the package has impressive test coverage, I found the suite of tests lacking especially with regards to the inputs for the exported functions. There are several instances where the inputs are not thoroughly checked, for example:

.VAR(diff(EuStockMarkets), -1)
 Error in `[.default`(x, (p + 1L - i):(TT - i), ) : 
only 0's may be mixed with negative subscripts

I'm also not convinced that the results of autotest have been properly accounted for. At the time of writing this review, running autotest on the package produces a data frame with 200 rows.

Visualisation (where appropriate)

Visualisations are mostly clear and aid the primary purposes of statistical interpretation of the results. I don't think there is a risk of statistical misinterpretation.

There is a small issue where the plot legend may overlap with the plotted values (for example when running plot(mod, type = "individual", method = "all") in the examples of plot.dfm), perhaps the authors could move the legend outside of the main figure area.

Package Design

Overall, the package seems mostly well designed for its intended purpose. The code has been thoroughly annotated making it easy to understand, although some of the lines are very long, and the indentation is sometimes inconsistent.

There are some curious design choices, like using switch instead of match.arg to check function arguments. Also, this part in DFM.R puzzled me:

# Quoting some functions that need to be evaluated iteratively
.EM_DGR <- quote(EMstepDGR(X, A, C, Q, R, F_0, P_0, cpX, n, r, sr, TT, rQi, rRi))
.EM_BM <- quote(EMstepBMOPT(X, A, C, Q, R, F_0, P_0, XW0, W, n, r, sr, TT, dgind, dnkron, dnkron_ind, rQi, rRi))
.KFS <- quote(SKFS(X, A, C, Q, R, F_0, P_0))

These expressions are then eval'd later, why exactly is this done as opposed to simply calling these functions directly?

I'm also curious as to what exactly is the logic in choosing which parts of the code are written in R and which in C++. It seems to me that there is quite a lot of matrix algebra going on in the R code. I would have expected that R would only be used to check and prepare the inputs for C++, and to provide outputs such as plots.

Print methods should return their respective argument objects invisibly (i.e., for dfm, dfm_summary, dfm_forecast, and ICr).

Since this is a non-tidyverse package, it is limited in terms of inter-operability in relation to other packages. For example, only base R graphics are available.

[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

This package does not conform to the guidelines in at least the following aspects:

Inconsistent use of assignment operators
No top-level documentation
No CITATION file
README should also show the results of the usage example (I suggest using .Rmd)

Estimated hours spent reviewing: 5

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

maurolepore commented 1 year ago

Sorry to jump in. My EiC rotation just started and I'm checking the status of every open issue.

I see @santikka was due to sumbit the review on 2023-01-04. Any updates?

Thanks everyone for your work! :100:

santikka commented 1 year ago

@maurolepore I submitted my review review on 2023-01-03: https://github.com/ropensci/software-review/issues/556#issuecomment-1369469742

maurolepore commented 1 year ago

Thanks @santikka and sorry for my confusion. The review that seems to be missing is that of @eeholmes. Having sent this gentle reminder I now step back and let you all continue doing great work. Thanks! <3

maelle commented 1 year ago

@noamross could you please log santikka's review with the bot? Thank you!

noamross commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/556#issuecomment-1369469742 time 5

ropensci-review-bot commented 1 year ago

Couldn't find entry for santikka in the reviews log

SebKrantz commented 1 year ago

Thanks a lot @santikka for the review! I will be addressing some of those comments in the coming months. In general I had also hoped for some comments regarding the statistical side of things, given that I am not a dynamic factor modelling expert and wrote the package principally due to lack of well maintained and computationally efficient alternatives in R.

The community feedback thus far has been positive, the package is part of a CRAN task view, and I hope to expand the functionality to mixed frequency estimation.

Regarding the review, I presume it is now upon me to implement the comments, and then the package will have a badge that it has been reviewed?

eeholmes commented 1 year ago

Hi @SebKrantz Sorry for the long delay in my review. My review will look at the package from the standpoint of a statistician, mainly for understanding the documentation.

I want to write our DFA examples in https://atsa-es.github.io/atsa-labs/sec-dfa.html using your package to compare speeds and just to get a handle on your model syntax. As a statistician with lots of DFA experience, I should be able to read the documentation and write the model in matrix (MARSS) form. If I can't, that will help me give you some feedback on where to make the documentation clearer.

I am the main MARSS developer. I very excited about your package because MARSS is not designed for big DFA models and actually the EM algorithm is really slow for Z (factor weights) updates (also it's written in R...). We usually steer people to TMB if they want fast DFA models, but that is a huge barrier for people. So I am keen to test out your package.

SebKrantz commented 1 year ago

Thanks @eeholmes. Looking forward to your review. The package code is an optimized R/C++ version of the Matlab code from Banbura, M., & Modugno, M. (2014) and Doz, C., Giannone, D., & Reichlin, L. (2012). The original Matlab codes as well as my initial rewrites of the codes in R + R-level optimization attempts are available in the GitHub repo under misc/.

eeholmes commented 1 year ago

@SebKrantz Installation comment for M1 Mac users. Solution was a little hard to find so if you have installation notes anywhere you might want to add a note.

Problem: no fortran compiler for M1 Macs in XCode. Problem discussed here: https://mac.r-project.org/tools/

Solution posted here https://github.com/RubD/Giotto_site/issues/11

# for R>=4.2.0
curl -O https://mac.r-project.org/tools/gfortran-12.0.1-20220312-is-darwin20-arm64.tar.xz

# unpack
sudo tar fxz gfortran-12.0.1-20220312-is-darwin20-arm64.tar.xz -C /

# /opt/R/arm64/gfortran/SDK has to point to your macOS SDK
# EEH: this didn't work for me but I was able to install anyhow and run example
sudo gfortran-update-sdk

eeholmes commented 1 year ago

Sorry, in my comment I meant the EM algorithm in the MARSS package struggles (= very slow) with the factors loading (C matrix) estimation. I wasn't referring to to the EM algorithm in Banbura, M., & Modugno, M. (2014) and Doz, C., Giannone, D., & Reichlin, L. (2012).

eeholmes commented 1 year ago

@SebKrantz Can I get a little help on something?

For identifiability, the C matrix must be constrained as Banbura & Modugno discuss here on the top of page 14 https://www.ecb.europa.eu/pub/pdf/scpwps/ecbwp1189.pdf

But when I fit a DFA (r=3, p=1) and look at the C matrix

model1 <- DFA(...)
model1$C

The C matrix doesn't have the identifiability constraint. So you must have done a rotation. Can you tell me what rotation DFA() is doing? I am trying to get the original constrained C matrix used in your EM algorithm. This C (which they call Lambda):

Thanks!

SebKrantz commented 1 year ago

@eeholmes, I am using a M1 mac and I have no problems building the package, binaries for mac are also available on CRAN. So the algorithm is a direct translation of B&M's original Matlab codes, in particular the standard single-frequency DFM estimation without autocorrelated errors forund in this file. I did a 1:1 translation of the code to R here and optimitzed this implementation in R here. This was then integrated with the C++ Kalman Filters and Smoothers to produce the implementation available in dfms::DFM().

Regarding the rotation, the Kalman filter and smoother was initialized with values from PCA, so I guess that gives orthogonal initial factor estimates and identifies the matrix. It may also be that B&M provide comments pertaining to mixed frequency estimation which I haven't implemented yet because as of now dfms estimates single frequency DFM's without autocorrelated errors.

SebKrantz commented 1 year ago

@eeholmes, I am using a M1 mac and I have no problems building the package, binaries for mac are also available on CRAN. So the algorithm is a direct translation of B&M's original Matlab codes, in particular the standard single-frequency DFM estimation without autocorrelated errors forund in this file. I did a 1:1 translation of the code to R here and optimitzed this implementation in R here. This was then integrated with the C++ Kalman Filters and Smoothers to produce the implementation available in dfms::DFM().

Regarding the rotation, the Kalman filter and smoother was initialized with values from PCA, so I guess that gives orthogonal initial factor estimates and identifies the matrix. It may also be that B&M provide comments pertaining to mixed frequency estimation which I haven't implemented yet because as of now dfms estimates single frequency DFM's without autocorrelated errors.

eeholmes commented 1 year ago

@SebKrantz I am working on my notes for my review here.

https://github.com/eeholmes/DFA/blob/main/review_notes.md

Right now I have just been making clarity comments from the perspective of a statistician who works on DFMs and EM algorithms but does not work with econometrics models. Once I fully understand the constraints that Doz, C., Giannone, D., & Reichlin, L and Doz, C., Giannone, D., & Reichlin, L. use, I can run some comparisons to other Kalman filter smoothers and DFM packages.

@noamross Is there a best way to share "work in progress" for a rOpenSci review?

noamross commented 1 year ago

Hi @eeholmes! We don't have a standard practice for this. People often open issues or PRs in their repositories to track the major changes underway. If you do this, its helpful to link back to this issue as then they will show up in this thread, as well.

ldecicco-USGS commented 4 months ago

Hi @SebKrantz and @noamross , I'm checking in on submissions that haven't seen much action.

Let me know if there is progress to report or any questions you may have. We can place this issue on "hold" if no work is expected to happen anytime soon. Otherwise, we should try to figure out a path forward.

ldecicco-USGS commented 4 months ago

Actually I am going to place it on hold so I know I've checked in, however it's very easy to switch it back to active!

ldecicco-USGS commented 4 months ago

@ropensci-review-bot put on hold

ropensci-review-bot commented 4 months ago

Submission on hold!

SebKrantz commented 4 months ago

Hi, so my general take is that I think the package is fairly robust and works well. I was still planning to extend its functionality towards mixed-frequency estimation, and have started some work in that regard, but currently have more important projects. When I do get to do that (this year I hope), I may also address some of the remaining form comments and add some of the requested methods. My interpretation of the reviews is that there were not substantial concerns about errors in the package. So yeah, happy to put things on hold until I get to work on it again.

ropensci-review-bot commented 1 month ago

@ldecicco-USGS: Please review the holding status

ropensci / software-review