thelovelab / fishpond

Differential expression and allelic analysis, nonparametric statistics
https://thelovelab.github.io/fishpond
27 stars 9 forks source link

Adds ability to include an "unspliced" assay in return from loadFry() #9

Closed lianos closed 3 years ago

lianos commented 3 years ago

I've been playing with the aleven-fry/2021 tutorials here, and wanted to enable a "one shot" loading of the alevin-fry USA quantitation that included an "unspliced" matrix next to "counts" in case the user wanted to pass these data into a velocity analysis.

When you call loadFry(..., velocity = TRUE) that's what you'll get.

As I was adding a unit test for this functionality, I noticed that the example test data provided returned a logical sparse matrix, and couldn't work for these test case.

As such, I changed the testdata for loadFry() as well added a bit of infrastructure to make it easier to create new alevin-fry quant test examples, which is described in the R/test-helpers.R file.

Sorry, but this made this PR a bit larger/featureful than you might be comfortable. I'd be happy to try to tweak it if you want to pare it down a bit.

mikelove commented 3 years ago

Steve,

Thanks for the PR! Let me loop in @DongzeHE who built out the loadFry functionality. Once Dongzhe has reviewed, I’ll make sure it’s passing tests on my side and push to Bioc devel.

mikelove commented 3 years ago

I checked this out and it works on my side, @DongzeHE I'm going to merge and we can address things in the devel branch if needed.

mikelove commented 3 years ago

Pushed v1.99.16 to Bioc devel branch. Thanks @lianos

https://github.com/mikelove/fishpond/commit/e5291f062bb90c235900465702ef36823f75dda2

DongzeHE commented 3 years ago

Okay! I went through the code, and I want to say the implementation is excellent, thank you so much @lianos. I will test it using more datasets and let you know if I have any questions.

lianos commented 3 years ago

Thanks for the kind words, @DongzeHE, your clear implementation of this function made it easy to just add a few tweaks here and there ;-)

I know you just merged this, but I think I already have two issues with the implementation I PR'd here.

Incorrect 'which_counts' parameter error check

This which_counts parameter check:

https://github.com/mikelove/fishpond/blob/46b58a253f217cd4aadae7795b7f81a580c26651/R/loadFry.R#L75-L77

Providing a vector of only length one should be allowed, so we should rather change it to:

        "`which_counts` must be a character vector with length() >= 1" = {
          is.character(which_counts) && length(which_counts) >= 1
        },

The 'velocity' parameter name

My brain was fixated on using the fry-alevin output to include the unspliced counts so I could do a velocity analysis, but perhaps that too myopic.

I was curious what you thought about changing that slightly and removing the velocity parameter, but rather overloading the utility of the which_counts parameter itself.

If you just pass it a character vector, the function would work as your original implementation and return the summed counts from the combo of U,S,A that the user provides.

Alternatively, you can accept a named list of character vectors, and the function will return a list of similarly named assay matrices in the SingleCellExperiment assay slot, perhaps with the assertion/requirement that position one is always named 'counts'.

So, for instance, this works as originally implemented

sce <- loadFry(frydir, c('S', 'A')) # sum spliced and ambiguous counts
sce <- loadFry(frydir, c('S')) # just use spliced counts

But this incantation would now return the same counts and unspliced matrix as velocity = TRUE does:

sce <- loadFry(frydir, list(counts = c("S", "A"), unspliced = "U"))
sce <- loadFry(frydir, list(c("S", "A"), unspliced = "U"))

Or for some reason the user wants to just use the spliced counts as the normal counts and sum the ambiguous and unspliced counts and put it into an assay matrix named foobar:

sce <- loadFry(frydir, list(counts = "S", foobar = c("A", "U"))

How would you feel about that? I'd be happy to provide a patch to tweak this a bit to make it work that way .. oorrrr, we can just leave it as it is. Thanks!

mikelove commented 3 years ago

I'll leave this to @DongzeHE :)

DongzeHE commented 3 years ago

Hi @lianos,

You are not myopic at all. Although my current implementation fits some user cases, the proposal you made will extend the flexibility of this function extremely. So please provide a patch to tweak this! I am very excited to see it.

I sincerely appreciate your help. Thank you. :)

mikelove commented 3 years ago

I addressed @lianos first point with this commit. Release coming up in October, so if there's anything else relevant to loadFry we can try to get it in before release.

5428de7

Thanks again for both of your great contributions to fishpond.