stemangiola / tidySingleCellExperiment

Brings SingleCellExperiment objects to the tidyverse
https://stemangiola.github.io/tidySingleCellExperiment/index.html
35 stars 10 forks source link

`tidySCE` Functionality to see and access the altExp slot #81

Open Biomiha opened 1 year ago

Biomiha commented 1 year ago

Hi all,

I am really liking your package, it has made many operations infinitely easier. We tend to prefer the tidySingleCellExperiment to Seurat, however, the one thing we have noticed is that there is no functionality to access the altExp slot, where our CITE-seq data are stored. In the standard SingleCellExperiment print method the altExpNames are listed at the bottom, whereas this is completely hidden in the tibble abstraction. Could this possibly be added?

Many thanks in advance.

stemangiola commented 1 year ago

You are very right. I though about it many times. We can add it to the tidyomics challenges. Are you aware of the page?

Biomiha commented 1 year ago

I was not before but I am now :) Thanks!

stemangiola commented 1 year ago

I was not before but I am now :) Thanks!

Great, it is here https://github.com/orgs/tidyomics/projects/1/views/1

Let's see if someone wants to commit to that. If you like the challenge and want to be part of our almost-ready-to-submit paper on tidyomics, feel free to propose yourself!

Biomiha commented 1 year ago

Sure, I'd be happy to contribute.

stemangiola commented 1 year ago

Amazing. I think two aspects are surely the header of tibble representation and the join_feature. maybe some care should be given when there are assays with repairs named how the front end would look like.

as user what you feel is missing? what tidy operations would you like to do with the alternative Experiment?

Biomiha commented 1 year ago

I think for starters there needs to be at least a mention in the print method that the altExp is not empty. As it is currently in tidySCE you have no way of knowing and have to go specifically looking for it. In terms of operations it depends on what technology was used for the altExp. If this was an antibody-tag experiment (as is most often the case) as far as I am concerned once the counts have been denoised and normalised the same operations can be used as for the main exp slot. If scaling were not an issue they could arguably even be used as additional features in the standard Experiment slot. Most people use them to plot (UMAP, ridgeplots, etc...) or to subcluster and refine existing clusters. I can't say I know enough about scATAC-seq and other platforms to judge but they seem pretty different so having a separate slot is advantageous. The main benefit of the sce object class in that regard is that you can subset individual or groups of cells and keep the underlying structures intact.

stemangiola commented 1 year ago

Ok let's start from adding altexp:assay_name in the header. How about if there are multiple altexp? we could do altexp[[1]]:assay_name? do altexp have names usually?

Biomiha commented 1 year ago

Yes, the altExp slot will have a name if it is populated and the structure within the altExp slot is the same as a normal SCE object (which in fairness is a bit confusing at times). Using the example from the OSCA book: (http://bioconductor.org/books/3.14/OSCA.advanced/integrating-with-protein-abundance.html), this is what the standard output looks like:

library(DropletTestFiles)
path <- getTestFile("tenx-3.0.0-pbmc_10k_protein_v3/1.0.0/filtered.tar.gz")
dir <- tempfile()
untar(path, exdir=dir)

'# Loading it in as a SingleCellExperiment object.
library(DropletUtils)
sce <- read10xCounts(file.path(dir, "filtered_feature_bc_matrix"))
sce
>> class: SingleCellExperiment 
>> dim: 33538 7865 
>> metadata(1): Samples
>> assays(1): counts
>> rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475 ENSG00000268674
>> rowData names(3): ID Symbol Type
>> colnames: NULL
>> colData names(2): Sample Barcode
>> reducedDimNames(0):
>> mainExpName: Gene Expression
>> altExpNames(1): Antibody Capture

altExp(sce)
>> class: SingleCellExperiment 
>> dim: 17 7865 
>> metadata(1): Samples
>> assays(1): counts
>> rownames(17): CD3 CD4 ... IgG1 IgG2b
>> rowData names(3): ID Symbol Type
>> colnames: NULL
>> colData names(0):
>> reducedDimNames(0):
>> mainExpName: NULL
>> altExpNames(0):
stemangiola commented 1 year ago

Cool. Of course, the philosophy of our interface is being modular rather than recursive (such as SCE inside and SCE).

great, so if header and join_features are the only additions, I think it is pretty straightforward. When we choose assay to join the feature from, we look both in regural and alternative experiments.

After this maybe we can think about, multiple PCA, UMAP etc..

stemangiola commented 1 year ago

@Biomiha please add your authorship details here https://docs.google.com/spreadsheets/d/19XqhN3xAMekCJ-esAolzoWT6fttruSEermjIsrOFcoo/edit?usp=sharing

Biomiha commented 1 year ago

I suppose I should actually contribute first, no :)?

stemangiola commented 1 year ago

I suppose I should actually contribute first, no :)?

Yes! as soon you manage to do a PR feel free to add yourself.

Biomiha commented 1 year ago

Hi Stefano,

Quick question if I may? I am new to pillar and have found the print method and the utilities files in the repo but can't seem to find where setup lives to change tbl_sum. Am I just being thick?

Thanks, Miha

stemangiola commented 1 year ago

Did you try to look for "setup" in all .R files?

I have to say that I am not an expert of pillar either, and I reversed engineered mostly. pillar became better recently so we can use a lot of low-level functions directly.

Were you able to orient yourself in the print method, where I modify the header of the "tibble"?

Biomiha commented 1 year ago

I've looked a bit yes but again I am new to modifying printing methods for tibbles so could very well be looking in the wrong place. I have been able to find the tbl_format_header.tidySingleCellExperiment and the print.SingleCellExperiment functions. I am able to tweak them but it seems I can only change the values and not the names, e.g. I can change the number of rows that are printed for the Features specification but not the word Features. I've tried changing it to Creatures but no joy :). As far as I can tell from the very nice and detailed description on the pillar website (https://pillar.r-lib.org/articles/printing.html) I would need to change tbl_sum that lives in tbl_format_setup. I'll do some more digging when I get a bit of time.

stemangiola commented 1 year ago

The fact that I could add "feature" means that you can change that :)

stemangiola commented 1 year ago

Please look at

https://github.com/stemangiola/tidySingleCellExperiment/blob/797cf25d795a1fba7173ad30e7693c555eccd17e/R/print_method.R#L34

Biomiha commented 1 year ago

Yes, that was the one I was looking at. I think I have been able to figure it out. Should submit a PR in the next couple of days.