privefl / bigstatsr

R package for statistical tools with big matrices stored on disk.
https://privefl.github.io/bigstatsr/
179 stars 30 forks source link

Task 1 failed: "'from' must be a finite number" #142

Closed malvarez-avalo closed 2 years ago

malvarez-avalo commented 2 years ago

Hello!

First, thanks so much for writing such a useful package. I've encountered an error when running the big_spLinReg function on some datasets, where the regression silently fails. The error is as follows:

Error in { : task 1 failed - "'from' must be a finite number"

I've managed to isolate a small reproducible dataset:

phenotypes.txt

matrix1.txt

It seems that the error might be caused in lines 284-299 of the BigLasso function, where NAs are produced during a centering and scaling step. Some reproducible code below:

library(bigstatsr)
library(data.table)
library(bigreadr)

tmp<-(fread2("matrix1.txt", nrows = 5))
tmp<-ncol(tmp)
system("rm *.bk")
X<-big_read("matrix1.txt",select =1:tmp)

Y<-fread("phenotypes.txt",data.table = FALSE)
Y<-Y[,1]

fit<-big_spLinReg(X,y.train = Y,alphas = 1,ncores = detectCores()-1,
                      lambda.min.ratio=0.001,K=5,nlam.min = 100)

Please let me know if there's other information I can provide!

Best wishes, Mariano

privefl commented 2 years ago

Thanks for the great reproducible example!

This is somehow due to the low variation of column 16080. This should have been detected, but resulted in a missing value instead of a boolean that this variable should not be kept. I don't have time to investigate this further, but missing values are now considered as FALSE and therefore not kept.

You need the latest GitHub version for this to work.

malvarez-avalo commented 2 years ago

Thanks so much for the help! We've just had a chance to test the updated version, and the error hasn't appeared again. Really appreciate the help, and thanks again for an awesome package!