thelovelab / fishpond

Differential expression and allelic analysis, nonparametric statistics
https://thelovelab.github.io/fishpond
27 stars 9 forks source link

support of sparsed Matrix output of alevin #1

Closed DrHogart closed 4 years ago

DrHogart commented 4 years ago

Hi, Currently the default output of salmon alevin is sparsed Matrix, but fishpond can deal only with regular matrix. E.g. trying to apply of scaleInfReps (v. 1.3.10) to salmon alevin (v.1.1.0) output file results to this:

Error in colSums(counts) :
  'x' must be an array of at least two dimensions
mikelove commented 4 years ago

Thanks — I’ll address this tonight.

mikelove commented 4 years ago

Thanks again for posting the issue, I think I've addressed this here:

c8d02c27c5af5f9a32951443c2f43b8e9fd2a068

In particular, I changed labelKeep so it can be run on sparse matrices, and then added code to the vignette to clarify how and when to make the matrices dense.

https://github.com/mikelove/fishpond/blob/master/vignettes/swish.Rmd#L742-L748

swish needs to compute ranks which currently makes the matrices not sparse, but the above should work for testing two groups of cells across a subset of expressed genes, which shouldn't be too large of a "cube" of data.

DrHogart commented 4 years ago

Thank you for the quick commit, it works on my data.

mikelove commented 4 years ago

Out of curiosity, how large of data are you interested in testing, eg how many genes passing filtering (minimal expression) and how many cells at a time?

Also make sure to update tximport to >= 1.14.2.

We had a bug in importing inf reps but we pushed a fix.

DrHogart commented 4 years ago

My data actually is the sample- and UMI-barcoded bulk RNA-seq, so the number of samples (aka cells) is not huge, less than 50; there are 6000-8000 genes passing filters in each sample. My tximport is 1.14.2

mikelove commented 4 years ago

Got it. Thanks. Shouldn’t be a problem converting to dense. It can get slow with ~1,000 of cells and all ~50,000 genes.

Oops i meant 1.14.2.