thelovelab / fishpond

Differential expression and allelic analysis, nonparametric statistics
https://thelovelab.github.io/fishpond
27 stars 9 forks source link

Issues with loadFry() ('x@assays' is not parallel to 'x') #38

Closed AnnaAMonaco closed 1 month ago

AnnaAMonaco commented 1 month ago

Dear Devs,

I am having some issues loading the output of alevin-fry scRNA-seq quantification into R; I have used all the following packages in the past without issue. I get the following error when trying to load my data:

Error in validObject(.Object) : 
  invalid class "SummarizedExperiment" object: 
    'x@assays' is not parallel to 'x'

Brief Summary I have scRNA-seq (10X) from two species, I quantified them with alevin-fry and want to continue with my standard single-cell analysis workflow in R with Seurat. Although there seems to be no issue with any of the output directories, I am failing to load the outputs into R and make a Seurat object out of them, but only for one of the two species.

Code ran and errors When working with species1, it fails already at loadFry(); I checked and all paths exist and contain all directories and files required.

> fryDir <- read.delim("/project/hedgehog_analysis_data/analysis/02_scRNAseq/02_musMus_only/outs/samples.txt", header=FALSE)
> fryDir <- as.list(fryDir$V1)
> my.samplenames <- c("MmDoE14r1", "MmVeE14r1", "MmDoE14r2", "MmVeE14r2", "MmDoE12r1", "MmDoE12r2")
> sce <- list()
> sobj <- list()
> 
> for (i in 1:length(fryDir)) {
+   # Load data using loadFry()
+   sce[[i]] <- loadFry(fryDir[[i]], outputFormat = "scRNA")
+   # Convert to Seurat object
+   sobj[[i]] <- as.Seurat(sce[[i]], counts = "counts", data = "logcounts", project=my.samplenames[i])
+ }
locating quant file
Reading meta data
USA mode: FALSE
Processing 56557 genes and 118293 barcodes
Not in USA mode, set assay name as "counts"
Constructing output SingleCellExperiment object
Error in validObject(.Object) : 
  invalid class "SummarizedExperiment" object: 
    'x@assays' is not parallel to 'x'

I know this error occurs at the first step of my loop:

> sce[[1]] <- loadFry(fryDir[[1]], outputFormat = "scRNA")
locating quant file
Reading meta data
USA mode: FALSE
Processing 56557 genes and 118293 barcodes
Not in USA mode, set assay name as "counts"
Constructing output SingleCellExperiment object
Error in validObject(.Object) : 
  invalid class "SummarizedExperiment" object: 
    'x@assays' is not parallel to 'x'

When looking up this error, it seemed like it might be something to do with old/incompatible Bioconductor packages, so I updated them all (see sessionInfo() below), but the issue persists. I am having a different set of issues with species2, but I think that is material for a new issue, if I cannot figure it out.

I am of course happy to provide any more info that could help troubleshooting. I have never had any issue with these steps before


Session Info

> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: MarIuX64 2.0 GNU/Linux

Matrix products: default
BLAS:   /pkg/R-4.4.1-0/lib/R/lib/libRblas.so
LAPACK: /usr/lib/liblapack.so.3.10.1

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] scDblFinder_1.18.0          scater_1.32.1
 [3] scran_1.32.0                scuttle_1.14.0
 [5] fishpond_2.10.0             scales_1.3.0
 [7] BiocSingular_1.20.0         ggplot2_3.4.4
 [9] dplyr_1.1.4                 cowplot_1.1.1
[11] Seurat_5.0.1                SeuratObject_5.0.1
[13] sp_2.1-2                    SingleCellExperiment_1.26.0
[15] SummarizedExperiment_1.34.0 Biobase_2.64.0
[17] GenomicRanges_1.56.1        GenomeInfoDb_1.40.1
[19] IRanges_2.38.1              S4Vectors_0.42.1
[21] BiocGenerics_0.50.0         MatrixGenerics_1.16.0
[23] matrixStats_1.1.0           tximeta_1.22.1
[25] tximport_1.32.0
AnnaAMonaco commented 1 month ago

UPDATE I sorted this out, turns out the issue was really dumb but I'm writing it here in case anyone else ever runs into it:

The txp2gene file I used had (for reasons still unknown to me) an empty line as line 1. This threw everything off for reading in the MM matrix with its cols and rows files. I fixed the txp2gene.tsv and re-run alevin-fry. Everything works like a charm now

mikelove commented 1 month ago

Thank you for reporting back Anna!

DongzeHE commented 1 month ago

Sorry I just saw the notification. Hi @AnnaAMonaco , when we talk about the "txp2gene file", are we referring to the gene_id_to_name.tsv or the "t2g_3col.tsv"? This issue is very interesting, and I would like to avoid it in both fishpond and simpleaf!

Best, Dongze

AnnaAMonaco commented 1 month ago

Hi @DongzeHE, I mean the tsv file used for --tgMap:

salmon alevin -lISR -1 $input/${sample}_R1_001.fastq.gz -2 $input/${sample}_R2_001.fastq.gz \
--chromiumV3 -i $index -p 12 -o $outDir/${sample} --tgMap $tsv --rad --sketch --dumpFeatures

When I first say this was the case, it was by opening the quants_mat_cols.txt output file. I made a "decoy" file for it where I removed the empty line, but it had obviously been embedded somehow in the quants_mat.mtx. Maybe there is a smarter way to get loadFry() to skip or remove any empty lines in the columns and/or matrix files?