mojaveazure / loomR

An R-based interface for loom files
61 stars 16 forks source link

Error when adding column attributes with NAs #40

Open NathanSkene opened 5 years ago

NathanSkene commented 5 years ago

When column attributes with NAs are added using:

lfile$add.col.attribute(newCol, overwrite = TRUE)

It gives this error: "Error in if (size == Inf) { : missing value where TRUE/FALSE needed"

It's a pretty cryptic error message. Would be good to set an error catch. Thanks for developing this package!

mojaveazure commented 5 years ago

Hi Nathan,

I am unable to reproduce this bug

> library(Seurat)
> data("pbmc_small")
> lfile <- as.loom(x = pbmc_small, filename = 'pbmc_small.loom')
Transposing input data: loom file will show input columns (cells) as rows and input rows (features) as columns
This is to maintain compatibility with other loom tools
  |======================================================================| 100%
Adding: CellID
Adding: Gene
Adding a layer to norm_data (layer 1 of 1)
  |======================================================================| 100%
Adding: mean
Adding: variance
Adding: variance_expected
Adding: variance_standardized
Adding: Selected
Adding: orig_ident
Adding: nCount_RNA
Adding: nFeature_RNA
Adding: RNA_snn_res_0_8
Adding: letter_idents
Adding: groups
Adding: RNA_snn_res_1
Adding: ClusterID
Adding: ClusterName
Adding scaled data matrix to /layers/scale_data
Adding a layer to scale_data (layer 1 of 1)
  |======================================================================| 100%
Adding dimensional reduction information for pca
Adding cell embedding information for pca
Adding feature loading information for pca
Adding dimensional reduction information for tsne
Adding cell embedding information for tsne
No feature loading information for tsne
Adding graph RNA_snn
> lfile
Class: loom
Filename: /home/paul/Software/loomR/pbmc_small.loom
Access type: H5F_ACC_RDWR
Attributes: version, chunks, LOOM_SPEC_VERSION, assay, last_modified
Listing:
       name    obj_type dataset.dims dataset.type_class
  col_attrs   H5I_GROUP         <NA>               <NA>
 col_graphs   H5I_GROUP         <NA>               <NA>
     layers   H5I_GROUP         <NA>               <NA>
     matrix H5I_DATASET     80 x 230          H5T_FLOAT
  row_attrs   H5I_GROUP         <NA>               <NA>
 row_graphs   H5I_GROUP         <NA>               <NA>
> lfile$mode
[1] "w-"
> newcol <- sample(x = c(1:10, NA), size = ncol(x = pbmc_small), replace = TRUE)
> any(is.na(x = newcol))
[1] TRUE
> lfile$add.col.attribute(attributes = newcol, overwrite = TRUE)
 Show Traceback

 Rerun with Debug
 Error in self$add.attribute(attributes = attributes, MARGIN = 2, overwrite = overwrite) : 
  Attributes must be provided as a named list 
> lfile$add.col.attribute(attributes = list('newcol' = newcol), overwrite = TRUE)
Adding: newcol
> any(is.na(x = lfile[['col_attrs/newcol']][]))
[1] TRUE

Would you be able to share your loom file and newCol list with me so I can debug better?

whtns commented 5 years ago

I am also getting this error when I have NA values in the Seurat object metadata. Weirdly it seems to happen when I have NAs present in a column of type character and not of type numeric. Here is an example based on the pbmc guided clustering tutorial

library(Seurat)

pbmc.data <- Read10X(data.dir = "~/single_cell_projects/resources/seurat_example/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)

cell_names <- colnames(pbmc)

new_char_col <- rep(c("1", NA), length(cell_names)/2)
names(new_char_col) <- cell_names

new_num_col <- rep(c(1, NA), length(cell_names)/2)
names(new_num_col) <- cell_names

pbmc_na_char <- AddMetaData(pbmc, new_char_col, col.name = "test")

pbmc_na_num <- AddMetaData(pbmc, new_num_col, col.name = "test")

pbmc.loom <- as.loom(pbmc, filename = "~/single_cell_projects/resources/seurat_example/pbmc.loom")

pbmc_na_char.loom <- as.loom(pbmc_na_char, filename = "~/single_cell_projects/resources/seurat_example/pbmc_na_char.loom")

pbmc_na_num.loom <- as.loom(pbmc_na_num, filename = "~/single_cell_projects/resources/seurat_example/pbmc_na_num.loom")
MajoroMask commented 5 years ago

@mojaveazure I reproduced @whtns 's example with the pbmc_small dataset lik this:

library(Seurat)
library(loomR)

data("pbmc_small")
lfile <- as.loom(x = pbmc_small, filename = 'pbmc_small.loom', overwrite = TRUE)

x <- list(
  "int" = sample(c(1L, 2L, NA), size = ncol(x = pbmc_small), replace = TRUE),
  "dbl" = sample(c(1.0, 2.0, NA), size = ncol(x = pbmc_small), replace = TRUE),
  "chr" = sample(c("a", "b", NA), size = ncol(x = pbmc_small), replace = TRUE)
)
x$fct <- addNA(factor(x$chr, levels = c("a", "b")))

lfile$add.col.attribute(
  attributes = list(
    # "chr" = x$chr,
    # "fct" = x$fct, 
    "int" = x$int,
    "dbl" = x$dbl
  ),
  overwrite = TRUE
)

The Error in if (size == Inf) error will be triggered once you uncomment the chr and/or fct line, showing what @whtns exposed before.

What I do for now is to recode NAs inside a character or factor attribute into "NA". but I really hope R's NA system can be fully functional for loomR object. Hope this can be of any help for you guys.

MajoroMask commented 5 years ago

BTW, sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] loomR_0.2.1.9000 hdf5r_1.2.0      R6_2.4.0         Seurat_3.1.0    

loaded via a namespace (and not attached):
 [1] nlme_3.1-141        tsne_0.1-3          bitops_1.0-6        bit64_0.9-7         RcppAnnoy_0.0.12   
 [6] RColorBrewer_1.1-2  httr_1.4.1          sctransform_0.2.0   tools_3.6.1         irlba_2.3.3        
[11] KernSmooth_2.23-15  uwot_0.1.3          lazyeval_0.2.2      colorspace_1.4-1    npsurv_0.4-0       
[16] tidyselect_0.2.5    gridExtra_2.3       bit_1.1-14          compiler_3.6.1      plotly_4.9.0       
[21] caTools_1.17.1.2    scales_1.0.0        lmtest_0.9-37       ggridges_0.5.1      pbapply_1.4-1      
[26] stringr_1.4.0       digest_0.6.20       R.utils_2.9.0       pkgconfig_2.0.2     htmltools_0.3.6    
[31] bibtex_0.4.2        htmlwidgets_1.3     rlang_0.4.0         rstudioapi_0.10     zoo_1.8-6          
[36] jsonlite_1.6        ica_1.0-2           gtools_3.8.1        dplyr_0.8.3         R.oo_1.22.0        
[41] magrittr_1.5        Matrix_1.2-17       Rcpp_1.0.2          munsell_0.5.0       ape_5.3            
[46] reticulate_1.13     R.methodsS3_1.7.1   stringi_1.4.3       gbRd_0.4-11         MASS_7.3-51.4      
[51] gplots_3.0.1.1      Rtsne_0.15          plyr_1.8.4          grid_3.6.1          parallel_3.6.1     
[56] gdata_2.18.0        listenv_0.7.0       ggrepel_0.8.1       crayon_1.3.4        lattice_0.20-38    
[61] cowplot_1.0.0       splines_3.6.1       SDMTools_1.1-221.1  pillar_1.4.2        igraph_1.2.4.1     
[66] future.apply_1.3.0  reshape2_1.4.3      codetools_0.2-16    leiden_0.3.1        glue_1.3.1         
[71] packrat_0.5.0       lsei_1.2-0          metap_1.1           data.table_1.12.2   RcppParallel_4.4.3 
[76] png_0.1-7           Rdpack_0.11-0       gtable_0.3.0        RANN_2.6.1          purrr_0.3.2        
[81] tidyr_0.8.3         future_1.14.0       assertthat_0.2.1    ggplot2_3.2.1       rsvd_1.0.2         
[86] survival_2.44-1.1   viridisLite_0.3.0   tibble_2.1.3        cluster_2.1.0       globals_0.12.4     
[91] fitdistrplus_1.0-14 ROCR_1.0-7       
lengfei5 commented 4 years ago

This is quite helpful and I just got the same problem. Thanks guys.

daccachejoe commented 4 years ago

Hi all, I'm getting the same error currently when trying to convert Seurat objects to loom objects using the as.loom function. Before as.loom I'm converting the character and factor NA's to "NA" using this loop:

for(j in 1:ncol(obj@meta.data)){ if(is.factor(obj@meta.data[,j]) == T){ obj@meta.data[,j][is.na(obj@meta.data[,j])] <- "N.A" } if(is.character(obj@meta.data[,j]) == T){ obj@meta.data[,j][is.na(obj@meta.data[,j])] <- "N.A" } }

But as.loom still produces the <<Error in if (size == Inf) { : missing value where TRUE/FALSE needed>> when trying to add orig.ident from the Seurat meta data. Am I converting the wrong NA's to strings or is this dependent on another issue? Thank you

whtns commented 4 years ago

I rediscovered this issue and it's open status several months after last commenting. Would just like to add, for those looking for loom conversion on the way to anndata the sceasy package worked well for me.

jingxinfu commented 4 years ago

After having tried @jad362's method, I notice there is a factor level missing:

Error in if (size == Inf) { : missing value where TRUE/FALSE needed Calls: main ... getDtype -> -> -> In addition: Warning message: In [<-.factor(*tmp*, thisvar, value = "") : invalid factor level, NA generated Execution halted

So I force factor variables to be characters. Now it works well for me:

for(j in 1:ncol(obj@meta.data)){
    if(is.factor(obj@meta.data[,j]) == T){
    obj@meta.data[,j] = as.character(obj@meta.data[,j]) # Force the variable type to be character
    obj@meta.data[,j][is.na(obj@meta.data[,j])] <- "N.A"
}
if(is.character(obj@meta.data[,j]) == T){
    obj@meta.data[,j][is.na(obj@meta.data[,j])] <- "N.A"
 }
}