Closed kmakino14 closed 8 months ago
Dear KM,
thank you for your interest in TACCO! And sorry for the slow response.
Yes, you can input a gene-by-program matrix.
TACCO's annotate function operates on reference profiles, which are stored in a .varm
slot of the reference AnnData. If they are not there from the beginning, they are generated from the observations e.g., in .X
. But it does not care where they come from. If they are there already, they are used. This can be used to supply any reference profiles to be used like gene programs instead of celltypes. To supply profiles directly in the reference, one can for example create a fresh clean Anndata of the correct shape and populate a .varm
slot:
import anndata as ad
import pandas as pd
# assuming you have a profiles_dataframe with your profiles in the columns and genes in the rows
profile_reference = ad.AnnData(var=profiles_dataframe[[]]) # create an AnnData of shape (0,n_genes) with the genes as .var.index, but otherwise empty
profile_reference.varm['profiles'] = profiles_dataframe
This profile_reference
can be used as reference in calls to the annotate
function.
In the docs of the annotate
function this feature is a little hidden in the description of the annotation_key
parameter. This is because the "side-loading of profiles" is a less standard use case and can lead to hickups later on in the processing. E.g., some annotation methods might require to have or work better if they have the expression itself in .X
for calculating the prior for the frequencies of the profiles in the data or for calculating a standard deviation of the profiles. This could also be done if instead of populating a .varm
key one populates .X
with "cells" sampled per profile according to the expected standard deviation and according to expected relative frequencies. But this becomes increasingly hacky and is therefore not recommended.
I hope this helps!
Dear all,
First of all, thanks for developing useful tools for analysis.
I would like to map a specific gene program in TACCO like your paper (https://www.biorxiv.org/content/10.1101/2022.10.02.508492v1.full).
Could you please let me know how to input the gene program as a reference? Do I input a matrix consisting of programs x genes?
Thanks!
Best, KM