selbouhaddani / OmicsPLS

R package for High dimensional data analysis and integration with O2PLS!
https://doi.org/10.1186/s12859-018-2371-3
31 stars 8 forks source link

crossval_o2m_adjR2 shows MSE of "NA" #3

Closed jmodlis closed 6 years ago

jmodlis commented 6 years ago

Hi,

I'm trying to run OmicsPLS on a RNA-Seq and Methyl-array dataset. When I run crossval_o2m_adjR2, I get MSE of "NA". These results do not look valid, especially since there is no n value given. Do you have any insight into what is going on?

Thanks! Jen

Command/output: crossval_o2m_adjR2(methyl.shared.trans, rna.shared.trans, 1:3, 0:3, 0:3, nr_folds = 2, nr_cores = 4) minimum is at n = Elapsed time: 570.87 sec MSE n nx ny 1 NA 1 0 3 2 NA 2 0 2 3 NA 3 0 3

selbouhaddani commented 6 years ago

Dear Jen, Thanks for your message.

Thanks in advance for your reply!

jmodlis commented 6 years ago

Hi Said,

Thanks so much for your response!

The dimensions of methyl.shared.trans are: > dim(methyl.shared.trans) [1] 23 242714 The dimensions of rna.shared.trans are: > dim(rna.shared.trans) [1] 23 15846

When I run the fit command, it seems to be ok...

summary(fit) Summary of the O2PLS fit -- Call: o2m(X = methyl.shared.trans, Y = rna.shared.trans, n = 1, nx = 1, ny = 1) -- Modeled variation -- Total variation: in X: 5582422 in Y: 364458 -- Joint, Orthogonal and Noise as proportions: data X data Y Joint 0.046 0.257 Orthogonal 0.046 0.073 Noise 0.908 0.670

-- Predictable variation in Y-joint part by X-joint part: Variation in Yhat relative to U: 0.979 -- Predictable variation in X-joint part by Y-joint part: Variation in Xhat relative to T: 0.979 (cutoff the rest)

Here is the output of loocv_combi(methyl.shared.trans, rna.shared.trans, 1, 1, 1, app_err=F, func=o2m, kcv = 2, stripped = TRUE):

loocv_combi(methyl.shared.trans, rna.shared.trans, 1, 1, 1, app_err=F, func=o2m, kcv = 2, stripped = TRUE) Data is not centered, proceeding... Using Power Method with tolerance 1e-10 and max iterations 100 Power Method (comp 1) stopped after 37 iterations. Power Method (comp 2) stopped after 26 iterations. Power Method (comp 1) stopped after 39 iterations.

Data is not centered, proceeding... Using Power Method with tolerance 1e-10 and max iterations 100 Power Method (comp 1) stopped after 31 iterations. Power Method (comp 2) stopped after 27 iterations. Power Method (comp 1) stopped after 31 iterations.

$CVerr [1] 2.010063

$Fiterr [1] NA

I did run scale2 on these datasets before hand to center them and scale the variance, so I'm not sure why it says the data is not centered (?) I am very new to data integration methods, so I wonder if I am missing something simple!

Thanks again, Jen

selbouhaddani commented 6 years ago

Hi Jen, Sorry to keep you waiting! OK, I think I found the bug, it was in the fitting function for high dimensional data. Can you update the package using devtools::install_github('selbouhaddani/OmicsPLS') and run your original code again? I'll send the new version to CRAN tomorrow. Best, Said

PS in cross-validation a subset of the data is taken. This subset does not have to have mean exactly zero, but that's no problem.

jmodlis commented 6 years ago

Hi Said,

No worries, you have been very helpful and quick!

When I run the code again, I get values for MSE instead of NA.

crossval_o2m_adjR2(methyl.shared.trans, rna.shared.trans, 1:3, 0:3, 0:3, nr_folds = 2, nr_cores = 4) minimum is at n = 2 Elapsed time: 528.67 sec MSE n nx ny 1 2.012393 1 0 3 2 2.009583 2 0 2 3 2.044003 3 0 3

Thanks again for your help! Jen

selbouhaddani commented 6 years ago

Great to hear! Glad that I can close this thread. If you have any more questions/remarks, please let me know.