Closed srlanalytics closed 5 years ago
Merging #55 into master will increase coverage by
0.58%
. The diff coverage is40.83%
.
@@ Coverage Diff @@
## master #55 +/- ##
==========================================
+ Coverage 59.43% 60.02% +0.58%
==========================================
Files 8 9 +1
Lines 1425 1476 +51
==========================================
+ Hits 847 886 +39
- Misses 578 590 +12
Impacted Files | Coverage Δ | |
---|---|---|
R/MLdfm.R | 0% <0%> (ø) |
:arrow_up: |
R/utils.R | 3.44% <0%> (+0.5%) |
:arrow_up: |
R/auto_process.R | 100% <100%> (ø) |
|
R/bdfm.R | 48.85% <13.04%> (-0.29%) |
:arrow_down: |
R/PCdfm.R | 71.42% <14.28%> (-1.2%) |
:arrow_down: |
src/BDFM.cpp | 70.75% <48.64%> (-0.96%) |
:arrow_down: |
R/dfm.R | 55.76% <50%> (-0.24%) |
:arrow_down: |
R/dfm_core.R | 62.72% <92.85%> (+3.72%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update fa33e5c...7dcdb4d. Read the comment docs.
Finally back to bdfm. Should this still be merged?
Yes. Sorry the last thing I pushed failed... I didn't get a chance to have a look on Travis.
On Sat, 16 Mar 2019, 02:21 Christoph Sax, notifications@github.com wrote:
Finally back to bdfm. Should this still be merged?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/srlanalytics/bdfm/pull/55#issuecomment-473525518, or mute the thread https://github.com/notifications/unsubscribe-auth/AZHEMyHnkZALM2_h5deaSZwQX59z9RwFks5vXOHcgaJpZM4bmTx0 .
In principle, I like the call below a lot. (I dislike store_idx
, how about target
?)
However it fails now:
(This also breaks the vignette, and causes travis to fail.)
library(bdfm)
m <- dfm(data = econ_us, factors = 3, pre_differenced = "A191RL1Q225SBEA", store_idx = "A191RL1Q225SBEA")
#> Error in DSmooth(B = B, Jb = Jb, q = q, H = H, R = R, Y = Y, freq = freq[-(1:m)], : matrix multiplication: incompatible matrix dimensions: 5x15 and 1x15
Created on 2019-03-17 by the reprex package (v0.2.1)
UVreg
? lm
uses qr decomposition which should be more robust..cpp
Stuff is in one file. Would be easier if it is organized more like the R folder.Regarding the call
m <- dfm(data = econ_us, factors = 3, pre_differenced = "A191RL1Q225SBEA", store_idx = "A191RL1Q225SBEA")
There was a bug in some of the additional output I had wanted from the models... should be fine now.
Regarding the other questions:
inv_sympd(trans(X)*X)
; I do this because we'll need the inverse of trans(X)*X
to calculate the variance of the estimated coefficients. Since we have to use it anyway, I also use it to calculate coefficients themselves instead of solve(trans(X)*X, trans(X)*y)
. The whole thing is super lightweight, so it hopefully doesn't slow anything down! Here's a little example:x <- rnorm(10000, 0, 1)
y <- 2*x + rnorm(10000, 0, 1)
#plug in arbritrary missing values
y[c(12,13,15,16,23,45,67,34,35,56,78,1000, 1003, 1213, 1234, 3455, 6788, 9999)] <- NA
x[c(23,34,45,56,67,78,79,100,101,1002, 1234,1245, 2345, 3566, 3468, 4567, 4569)] <- NA
start_time <- Sys.time()
est <- lm(y ~ x)
end_time <- Sys.time()
lm_time <- end_time - start_time
start_time <- Sys.time()
est <- UVreg(x,y)
#> Error in UVreg(x, y): could not find function "UVreg"
end_time <- Sys.time()
UV_time <- end_time - start_time
as.numeric(lm_time)/as.numeric(UV_time)
[1] 3.997609
Created on 2019-03-19 by the reprex package (v0.2.1)
UVreg in this example is about 4 times faster than lm()
because it has so much less to do!
Some more on lm
.
see also Dirk’s slides
library(microbenchmark)
library(bdfm)
n <- 100 ; p <- 1
X <- matrix(rnorm(n * p), n, p) # no intercept!
y <- rnorm(n)
# first run needs to load some stuff, outside of competition
first.run <- bdfm:::UVreg(X, y)
microbenchmark(
lm = {m0 <- lm(y ~ 0 + X)},
lm.fit = {m1 <- lm.fit(x=X,y=y)},
.lm.fit = {m2 <- .lm.fit(x=X,y=y)},
uvreg = {m3 <- bdfm:::UVreg(X, y)}
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> lm 464.389 489.9860 568.64200 542.9470 585.2460 2858.815 100
#> lm.fit 30.740 36.4000 48.00131 42.8280 48.2255 478.146 100
#> .lm.fit 3.653 6.3125 8.22999 7.6855 9.7755 31.620 100
#> uvreg 22.199 30.8080 37.76772 38.0175 43.4510 79.224 100
Created on 2019-03-19 by the reprex package (v0.2.1)
Because we're not including an intercept term all that's really going on is dividing scalars. UVreg is just calculating as_scalar(trans(x)*y)/as_scalar(trans(x)*(x))
. There's no need for any fancy matrix anything here... it's just division. I had only included inv_sympd(trans(X)*X)
initially because the default was to include an intercept term (in which case trans(X)*X
is a matrix, which it's not here).
Most of the computational time of UVreg()
(not that there's much) is actually taken up dropping NA values while keeping the results correct (i.e. we can't just use y <- y[is.finite(y)]
). lm.fit() and .lm.fit() are great, but don't deal with the NA issue.
Thanks, sounds perfect! Sorry for the hold up.
I've written a very simple function to automatically take logs and differences of input data. It uses a simple AR(1) model yt = a y{t-1} + e_t
The null hypothesis is that a = 1 and the rule is as follows:
This is determined using two one-sided tests. Right now I have it set so that if our point estimate for a is more than one standard deviation above 1 (and have no zero or negative values), take logs, but it must be 2 standard deviations below one not to take differences, but we can tweak this as we go.
Seems very simple but works well, at least for our us_gdp.R example.