Closed BrianLohman closed 3 years ago
This issue here is that the input "count" matrix contains Inf
values, and does not appear to be integer counts. Running NormalizeData
on values that are already Inf
will result in NA
. We will add a warning (or maybe throw an error) if infinite values are present in the count matrix.
Hi @timoast ,
Thanks for your reply and for looking into this. I am very surprised that the values are not integers. I will have to ask UCSC what is going on.
Yes, a warning about why NA's are introduced would be much appreciated. Flagging the rows/recording the indices of the rows that throw NAs so they could be inspected would also be most welcome.
Thanks again for your
@BrianLohman
I'm working with the same dataset from UCSC. How did you end up resolving this issue? Did you hear back from UCSC?
Hi @jjwestfall521, I emailed the help email that UCSC provides and they connected me to the authors. You might just go straight to the corresponding author's email, it's probably faster.
Hi,
I have found the same problem with another dataset and it might be the same for you.
I checked which rows (features) had NAs like this (COV is my Seurat object, RowsNA will contain the names of all the features):
I saw all the features with NAs seemed to be antibody names, which I guess they just had for a few samples so not all cells have values for these features, they look like this:
RowsNA [1] "AB-CD80" "AB-CD86" "AB-CD274"
[4] "AB-PDCD1LG2" "AB-ICOSLG" "AB-ITGAM"
so I just removed these rows/features from the object like this:
'%!in%' <- function(x,y)!('%in%'(x,y)) #this is a NOT IN function RowsKEEP<-rownames(COV)[rownames(COV) %!in% RowsNA] COV<-subset(COV,features=RowsKEEP)
and I did not get the error in FindVariableFatures anymore:
COV <- FindVariableFeatures(COV, selection.method = "vst", nfeatures = 6000) Calculating gene variances 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Calculating feature variances of standardized and clipped values 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **|
I hope it helps
For the benefit of anyone similarly afflicted...
I ran across this error when I was borrowing Seurat::FindVariableFeatures for another application where the mode of my matrix, not being of counts, was 'numeric'. Rounding it and setting its mode to 'integer' was my workaround.
FWIW, my application is bulk ATAC-seq reads tallied in genomic windows (a la csaw), but post quantile normalization.
@Clara22 hi
thanks for your help
I have encountered the same problem, and solve it by your method
but i still have the question that why this problem occured ?
looking forward to your answer!
Hi @Harmony714
In the case of the dataset I mention the NAs were there because the features corresponding to antibodies were not available for all cells. In other words the dataset contained samples with RNAseq data only and samples with protein (from antibodies quantification) and RNAseq (probably from CITEseq), therefore cells corresponding to the former samples had no data (NAs) for the antibody/protein features.
I don't know where do the NAs come from in your case. You may want to inspect which rows and columns contain NAs to investigate the issue in your dataset.
Best, Clara
I found NAs introduced after normalisation step in my scRNA data the raw counts data had negative values . For some reason have negative values introduced by cell bender checked cell ranger filt. matrix, it is all good no negative raw counts.
Sorry to bother everyone, but got same error. I opened issue https://github.com/ZJU-UoE-CCW-LAB/scCDC/issues/9, then got to know comments here. But solution from @Clara22 seems not working. Please note I am using Seurat 5.1.0. Hope someone could help, appreciate.
Here is details:
### read scCDC corrected matrix`
> mislet_seuratobj_corrected <- readRDS("02.AVS_0d_seuratobj_corrected.rds")
### set "Corrected" matrix as default
> DefaultAssay(mislet_seuratobj_corrected) <- "Corrected"
### Seurat Normalization and FindVariables
> mislet_seuratobj_corrected <- NormalizeData(mislet_seuratobj_corrected,
normalization.method = "LogNormalize",scale.factor = 10000)
Normalizing layer: counts
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> mislet_seuratobj_corrected <- FindVariableFeatures(mislet_seuratobj_corrected,
selection.method = "vst", nfeatures = 2000)
Finding variable features for layer counts
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Error in hvf.info$variance.expected[not.const] <- 10^fit$fitted :
NAs are not allowed in subscripted assignments
### try to find NA rows
> RowsNA <- names(which(rowSums(is.na(mislet_seuratobj_corrected@assays$Corrected@layers$counts))>0))
> RowsNA
NULL
Here is session info:
> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-conda-linux-gnu
Running under: Debian GNU/Linux 11 (bullseye)
Matrix products: default
BLAS/LAPACK: /data/pathology/program/Miniforge3/envs/R4.2.3/lib/libopenblasp-r0.3.27.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/London
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] hdf5r_1.3.11 scCDC_1.4 tidyr_1.3.1 knitr_1.48
[5] pROC_1.18.5 Matrix_1.7-0 ggrepel_0.9.6 pbapply_1.7-2
[9] ggplot2_3.5.1 dplyr_1.1.4 ddpcr_1.15.2 Seurat_5.1.0
[13] SeuratObject_5.0.2 sp_2.1-4
loaded via a namespace (and not attached):
[1] deldir_2.0-4 gridExtra_2.3 rlang_1.1.4
[4] magrittr_2.0.3 RcppAnnoy_0.0.22 matrixStats_1.4.1
[7] ggridges_0.5.6 compiler_4.4.1 spatstat.geom_3.3-3
[10] png_0.1-8 vctrs_0.6.5 reshape2_1.4.4
[13] stringr_1.5.1 pkgconfig_2.0.3 fastmap_1.2.0
[16] utf8_1.2.4 promises_1.3.0 bit_4.5.0
[19] purrr_1.0.2 xfun_0.47 jsonlite_1.8.9
[22] goftest_1.2-3 later_1.3.2 spatstat.utils_3.1-0
[25] irlba_2.3.5.1 parallel_4.4.1 cluster_2.1.6
[28] R6_2.5.1 ica_1.0-3 stringi_1.8.4
[31] RColorBrewer_1.1-3 spatstat.data_3.1-2 reticulate_1.39.0
[34] parallelly_1.38.0 spatstat.univar_3.0-1 lmtest_0.9-40
[37] scattermore_1.2 Rcpp_1.0.13 tensor_1.5
[40] future.apply_1.11.2 zoo_1.8-12 sctransform_0.4.1
[43] httpuv_1.6.15 splines_4.4.1 igraph_2.0.3
[46] tidyselect_1.2.1 abind_1.4-8 spatstat.random_3.3-2
[49] codetools_0.2-20 miniUI_0.1.1.1 spatstat.explore_3.3-2
[52] listenv_0.9.1 lattice_0.22-6 tibble_3.2.1
[55] plyr_1.8.9 withr_3.0.1 shiny_1.9.1
[58] ROCR_1.0-11 Rtsne_0.17 future_1.34.0
[61] fastDummies_1.7.4 survival_3.7-0 polyclip_1.10-7
[64] fitdistrplus_1.2-1 pillar_1.9.0 KernSmooth_2.23-24
[67] plotly_4.10.4 generics_0.1.3 RcppHNSW_0.6.0
[70] munsell_0.5.1 scales_1.3.0 globals_0.16.3
[73] xtable_1.8-4 glue_1.7.0 lazyeval_0.2.2
[76] tools_4.4.1 data.table_1.16.0 RSpectra_0.16-2
[79] RANN_2.6.2 leiden_0.4.3.1 dotCall64_1.1-1
[82] cowplot_1.1.3 grid_4.4.1 colorspace_2.1-1
[85] nlme_3.1-166 patchwork_1.3.0 cli_3.6.3
[88] spatstat.sparse_3.1-0 spam_2.10-0 fansi_1.0.6
[91] viridisLite_0.4.2 uwot_0.2.2 gtable_0.3.5
[94] digest_0.6.37 progressr_0.14.0 htmlwidgets_1.6.4
[97] farver_2.1.2 htmltools_0.5.8.1 lifecycle_1.0.4
[100] httr_1.4.7 mime_0.12 bit64_4.5.2
[103] MASS_7.3-61
update: error solved, see details here [(https://github.com/ZJU-UoE-CCW-LAB/scCDC/issues/9)]
Hi everyone, I got new debug update here. Hope someone could explain a little bit more. https://github.com/ZJU-UoE-CCW-LAB/scCDC/issues/9#issuecomment-2386507368
Hello, I am seeing a problem with the NormalizeData() function. I asked this question on #2803, but I think it deserves it's own issue.
It appears that NormalizeData is introducing NAs that result in an error in FindVariableFeatures.
Processing of UCSC data taken from here: https://cellbrowser.readthedocs.io/load.html
Any help would be appreciated.