openSesame() - Error in names(res) <- nms

YoannPa commented 3 years ago

Hi,

I am currently trying to use sesame pipeline on 450K data I have as Idat files, in a folder as following:

library(sesame)
library(BiocParallel)

IDATprefixes <- searchIDATprefixes("my/path/to/idatfolder/")

MulticoreParam_obj <- bpparam()
MulticoreParam_obj$workers <- 20

betas <- openSesame(IDATprefixes, BPPARAM = MulticoreParam_obj)

This returns me an error and a warning (I have translated parts that are not in english):

Error in names(res) <- nms :
'names'  [93] attribute must be the same length as the vector [9]
Moreover : Warning message:
stop worker failed:
   attempt to select less than one element in OneIndex

I made multiple attempts with the exact same piece of code : note that following these attempts the lengh of "vector" in the error message is not always the same (9 or 4).
Using a smaller integer for number of workers resulted in the same kind of error message and warning.

What am I doing wrong ?

traceback gives:

traceback()

4: bplapply(x, openSesame, platform = platform, manifest = manifest, 
       BPPARAM = BPPARAM, ...)
3: bplapply(x, openSesame, platform = platform, manifest = manifest, 
       BPPARAM = BPPARAM, ...)
2: do.call(cbind, bplapply(x, openSesame, platform = platform, manifest = manifest, 
       BPPARAM = BPPARAM, ...))
1: openSesame(IDATprefixes, BPPARAM = MulticoreParam_obj)

My sessionInfo():

R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/lib64/libblas.so.3.4.2
LAPACK: /usr/lib64/liblapack.so.3.4.2

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] BiocParallel_1.22.0  sesame_1.6.0         sesameData_1.6.0    
[4] ExperimentHub_1.14.2 AnnotationHub_2.20.2 BiocFileCache_1.12.1
[7] dbplyr_2.0.0         BiocGenerics_0.34.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                    lattice_0.20-41              
 [3] assertthat_0.2.1              digest_0.6.26                
 [5] mime_0.9                      R6_2.5.0                     
 [7] GenomeInfoDb_1.24.2           stats4_4.0.0                 
 [9] RSQLite_2.2.1                 httr_1.4.2                   
[11] pillar_1.4.6                  zlibbioc_1.34.0              
[13] rlang_0.4.8                   curl_4.3                     
[15] blob_1.2.1                    S4Vectors_0.26.1             
[17] Matrix_1.2-18                 wheatmap_0.1.0               
[19] preprocessCore_1.50.0         RCurl_1.98-1.2               
[21] bit_4.0.4                     shiny_1.5.0                  
[23] DelayedArray_0.14.1           HDF5Array_1.16.1             
[25] compiler_4.0.0                httpuv_1.5.4                 
[27] pkgconfig_2.0.3               htmltools_0.5.0              
[29] tidyselect_1.1.0              SummarizedExperiment_1.18.2  
[31] tibble_3.0.4                  GenomeInfoDbData_1.2.3       
[33] interactiveDisplayBase_1.26.3 DNAcopy_1.62.0               
[35] IRanges_2.22.2                matrixStats_0.57.0           
[37] randomForest_4.6-14           crayon_1.3.4                 
[39] dplyr_1.0.2                   withr_2.3.0                  
[41] later_1.1.0.1                 bitops_1.0-6                 
[43] rappdirs_0.3.1                grid_4.0.0                   
[45] xtable_1.8-4                  lifecycle_0.2.0              
[47] DBI_1.1.0                     magrittr_2.0.1               
[49] XVector_0.28.0                promises_1.1.1               
[51] ellipsis_0.3.1                generics_0.1.0               
[53] vctrs_0.3.4                   Rhdf5lib_1.10.1              
[55] RColorBrewer_1.1-2            tools_4.0.0                  
[57] bit64_4.0.5                   Biobase_2.48.0               
[59] glue_1.4.2                    purrr_0.3.4                  
[61] BiocVersion_3.11.1            fastmap_1.0.1                
[63] yaml_2.2.1                    AnnotationDbi_1.50.3         
[65] colorspace_2.0-0              rhdf5_2.32.4                 
[67] BiocManager_1.30.10           GenomicRanges_1.40.0         
[69] memoise_1.1.0

Thank you in advance for your help !

zwdzwd commented 3 years ago

Have you got any success running this in serial mode on a small number of input samples, say 3? I am trying your code and it works on my end. But I am running on a mac.

YoannPa commented 3 years ago

@zwdzwd I will try on a smaller subset and check if it works under Ubuntu, and will come back to you with more info.

Could the problem come from the version of R I am using ? (R version 4.0.0 (2020-04-24))

YoannPa commented 3 years ago

@zwdzwd So, I tried on a smaller subset of IDATs with less cores, and everything worked well ! Now I am starting to suspect that the system goes out of memory for some reasons.

The warning message put me on this track thanks to this comment

The funny part is that I always keep an eye on htop when I parallel compute something on a cluster, to make sure the cores and the RAM I need are available. And at no time during the process the RAM nor the Swap got full...

I have send a mail to the admin sys of the cluster I use to clarify this point.

zwdzwd commented 3 years ago

I am glad you figure it out!

YoannPa commented 3 years ago

Hi @zwdzwd , After several attempt I ruled out the memory issue. It is very likely that there is an issue in one of the IDAT files I use. The error is replicable from one computer to another under Linux (Ubuntu or CentOS) using the same data with the same faulty IDAT file. This is a first for me, as I previously made multiple analysis on these IDAT files without trouble. Could it be a IDAT version incompatibility ?

Would you be interested into checking the actual content of this / these problematic IDATs ? Otherwise do you know any tool to check the integrity of IDAT files from HM450K ?

Thank you for your support. Best,

Yoann.

zwdzwd commented 3 years ago

I think it is possible. can you try manually install illuminaio from my fork https://github.com/zwdzwd/illuminaio let me know if this solves your problem. if not, you need to show me your error message.

YoannPa commented 3 years ago

I think it is possible. can you try manually install illuminaio from my fork https://github.com/zwdzwd/illuminaio let me know if this solves your problem. if not, you need to show me your error message.

I uninstalled my current version of illuminaio, and install the one from the github repo you linked. I restarted my R session, and reloaded packages, with the code. This did not solve the problem. So I believe that 1 IDAT is indeed faulty or corrupted. I runned without any trouble openSesame on other IDAT cohorts I had. I will test all IDATs and let you know if I found something similar between them.

YoannPa commented 3 years ago

I have tested all IDATs in my possession : only 1 is faulty. So, except if you are interested into finding why this specific IDAT doesn't work with openSesame, I can live without 1 sample out of a thousand. Thank you for the help !

zwdzwd commented 3 years ago

OK. yeah, if not too much trouble can you send me that one IDAT I can look into it when I get a chance. Thanks for digging!

zwdzwd / sesame

openSesame() - Error in names(res) <- nms #20