wbnicholson / BigVAR

Dimension Reduction Methods for Multivariate Time Series
56 stars 17 forks source link

Dim reductoin on big dataset #13

Open MislavSag opened 4 years ago

MislavSag commented 4 years ago

Great package.

Is the package suitable for very big datasets? I am talking about the datasets of dimension (1.000.000x300)?

I have just tried this code:

mod1<-constructModel(data_sample,p=4,"Basic",gran=c(150,10),RVAR=FALSE,h=1,cv="Rolling",MN=FALSE,verbose=FALSE,IC=TRUE)
results=cv.BigVAR(mod1)

and it is pretty slow with just (1000x100) X matrix (cca 10 minutes).

My goal is to do dimension reduction, but not sure if your package is appropriate for this.

wbnicholson commented 4 years ago

Time series with those dimensions (large T, small k) should be feasible in this framework, but rolling validation for penalty parameter selection is not advisable since the process will be very computationally intensive. I would instead suggest something like n-fold cross validation as described in section 3.2 http://www.wbnicholson.com/BigVAR.html.

One was to potentially improve performance is to ensure that the BLAS/OpenMP are single-threaded. You can do so by adding the following code to your .Rprofile:

` library(RhpcBLASctl)

blas_set_num_threads(1)

omp_set_num_threads(1) `

MislavSag commented 4 years ago

I have returned to your answer after some time :)

I have just tried to implement CV from this tutorial: http://www.wbnicholson.com/BigVAR.html#n-fold-cross-validation CV part is in 3.2.

When I execute the NFoldcv function it returns and error: Error in 2:nrow(Z1) : argument of length 0 The problem is that is a list of two elements: Y and Z. So instead of Z1 there should be Z1$Z or Z1$Y? in line trainZ <- Z1[2:nrow(Z1),].

wbnicholson commented 4 years ago

Yes, it should be Z1$Z, I will make the correction.

MislavSag commented 4 years ago

Thanks. I think this can be closed now.

MislavSag commented 3 years ago

It seems this is not solved?

wbnicholson commented 3 years ago

This has been fixed now.