Closed lianos closed 3 years ago
Steve,
Thanks for the PR! Let me loop in @DongzeHE who built out the loadFry functionality. Once Dongzhe has reviewed, I’ll make sure it’s passing tests on my side and push to Bioc devel.
I checked this out and it works on my side, @DongzeHE I'm going to merge and we can address things in the devel branch if needed.
Pushed v1.99.16 to Bioc devel branch. Thanks @lianos
https://github.com/mikelove/fishpond/commit/e5291f062bb90c235900465702ef36823f75dda2
Okay! I went through the code, and I want to say the implementation is excellent, thank you so much @lianos. I will test it using more datasets and let you know if I have any questions.
Thanks for the kind words, @DongzeHE, your clear implementation of this function made it easy to just add a few tweaks here and there ;-)
I know you just merged this, but I think I already have two issues with the implementation I PR'd here.
This which_counts
parameter check:
Providing a vector of only length one should be allowed, so we should rather change it to:
"`which_counts` must be a character vector with length() >= 1" = {
is.character(which_counts) && length(which_counts) >= 1
},
My brain was fixated on using the fry-alevin output to include the unspliced counts so I could do a velocity analysis, but perhaps that too myopic.
I was curious what you thought about changing that slightly and removing the velocity
parameter, but rather overloading the utility of the which_counts
parameter itself.
If you just pass it a character vector, the function would work as your original implementation and return the summed counts from the combo of U,S,A
that the user provides.
Alternatively, you can accept a named list of character vectors, and the function will return a list of similarly named assay matrices in the SingleCellExperiment
assay slot, perhaps with the assertion/requirement that position one is always named 'counts'.
So, for instance, this works as originally implemented
sce <- loadFry(frydir, c('S', 'A')) # sum spliced and ambiguous counts
sce <- loadFry(frydir, c('S')) # just use spliced counts
But this incantation would now return the same counts
and unspliced
matrix as velocity = TRUE
does:
sce <- loadFry(frydir, list(counts = c("S", "A"), unspliced = "U"))
sce <- loadFry(frydir, list(c("S", "A"), unspliced = "U"))
Or for some reason the user wants to just use the spliced counts as the normal counts
and sum the ambiguous and unspliced counts and put it into an assay matrix named foobar
:
sce <- loadFry(frydir, list(counts = "S", foobar = c("A", "U"))
How would you feel about that? I'd be happy to provide a patch to tweak this a bit to make it work that way .. oorrrr, we can just leave it as it is. Thanks!
I'll leave this to @DongzeHE :)
Hi @lianos,
You are not myopic at all. Although my current implementation fits some user cases, the proposal you made will extend the flexibility of this function extremely. So please provide a patch to tweak this! I am very excited to see it.
I sincerely appreciate your help. Thank you. :)
I've been playing with the aleven-fry/2021 tutorials here, and wanted to enable a "one shot" loading of the alevin-fry USA quantitation that included an "unspliced" matrix next to "counts" in case the user wanted to pass these data into a velocity analysis.
When you call
loadFry(..., velocity = TRUE)
that's what you'll get.As I was adding a unit test for this functionality, I noticed that the example test data provided returned a logical sparse matrix, and couldn't work for these test case.
As such, I changed the testdata for loadFry() as well added a bit of infrastructure to make it easier to create new
alevin-fry quant
test examples, which is described in theR/test-helpers.R
file.Sorry, but this made this PR a bit larger/featureful than you might be comfortable. I'd be happy to try to tweak it if you want to pare it down a bit.