Accessing Metadata from AnnData

icbi-lab / infercnvpy

Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.

https://infercnvpy.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

133 stars 27 forks source link

Accessing Metadata from AnnData #50

Open shahrozeabbas opened 2 years ago

shahrozeabbas commented 2 years ago

Hello,

I am attempting to export metadata after running inferCNV. I am able to export the CNV score and the Leiden clusters, however I would like to access everything that is also available in the R version such as the loss or gain of each chromosome for each cell. The R version seems to release a large table (~200) columns with data for each chromosome. Is it possible to access this somehow using the python version?

grst commented 2 years ago

Hi,

the matrix with CNV scores is stored in

> adata.obsm["X_cnv"]
<184x9111 sparse matrix of type '<class 'numpy.float64'>'
    with 214913 stored elements in Compressed Sparse Row format>

where each row is a cell and each column a genomic region.

Additionally, there's information which columns in this matrix belong to which chromosome in

> adata.uns["cnv"]["chr_pos"]
{'chr1': 0,
 'chr2': 915,
 'chr3': 1574,
 'chr4': 2141,
 'chr5': 2454,
 'chr6': 2902,
 'chr7': 3394,
 'chr8': 3874,
 'chr9': 4195,
 'chr10': 4564,
 'chr11': 4955,
 'chr12': 5494,
 'chr13': 6009,
 'chr14': 6179,
 'chr15': 6499,
 'chr16': 6791,
 'chr17': 7209,
 'chr18': 7787,
 'chr19': 7928,
 'chr20': 8523,
 'chr21': 8781,
 'chr22': 8880}

i.e. in this example

adata.obsm["X_cnv"][:, 0:915]

contains the scores for chr1.

hope that helps, Gregor

shahrozeabbas commented 2 years ago

Hello,

Yes this is helpful, thanks! However, it looks like this info is a superset of the table described here. Is there a way to acquire the 'map_metadata_from_infercnv.txt' described in that link directly from the infercnv object? Either that or maybe is there a way to calculate these data from what's available in data.obsm["X_cnv"]?

Thank you for your help, Shahroze

grst commented 2 years ago

Unfortunately, segmentation (e.g. using HMM) is currently not implemented in infercnvpy (See also #1). In principle, you can aggregate the CNV matrix, if you are interested in a certain region, e.g. indices 915:1200 (roughly) refer to the first half of chromosome 2. If you are interested in this region, you could do

cnv_mat = adata.obsm["X_cnv"]
chr2_score = np.mean(cnv_mat[:, 915:1200], axis=1)

to get a score for each cell.

zhangpebbels commented 1 year ago

Hello~ I got into some trouble. I wanna get the cnv region in chr8 del.I wanna konw which genes del in chr8. but now through the 'X_cnv', I can just get the number but not the correct geneID. And metadata 'chromosome' is not paired with 'X_cnv'.I run infercnvpy with exclude "X,Y,MT,nan",but the number is wrong.chr14:170genes but in the ['chr_pos']:chr14:180

>>adata.var['chromosome'].value_counts()
chr1      525
chr2      347
chr17     308
chr19     307
chr11     301
chr12     287
chr3      281
chr6      277
chr5      240
chr7      237
chr16     207
chr10     192
chr4      188
chr9      170
chr14     170
chr8      169
chrX      152
chr15     147
chr20     135
chr22     128
chr13      90
chr18      68
chr21      49
chrMT      18
chrnan      3

chr1 0 chr2 525 chr3 872 chr4 1153 chr5 1341 chr6 1581 chr7 1858 chr8 2095 chr9 2264 chr10 2434 chr11 2626 chr12 2927 chr13 3214 chr14 3313 chr15 3483 chr16 3630 chr17 3837 chr18 4145 chr19 4244 chr20 4551 chr21 4686 chr22 4785

So could you add the geneID in the ['X_cnv'].or maybe other ways to get the CNV region.Thanks so much.Waiting for your reply.

grst commented 1 year ago

Hi @zhangpebbels,

yes, the metadata in var does not correspond to the data in X_cnv, as one is based on genes and the other on bins that may consist of multiple genes.

@redst4r has been working on a feature to retrieve genes for each bin in #58. But there are still some tests failing and I'm not entirely sure what the status of that PR is.

jpark27 commented 11 months ago

Hi, @grst @redst4r

Thank you for sharing great wrapper for infercnvpy. I've been trying to annotate matching gene on heatmap plot (c.f., bottom for all/subset of matching gene symbols) but tuning on show_gene_labels=True only show relevant segment. I wonder is there any work around solution I can try? Possibly @redst4r already found solution but forgot to update repo? Any tips would be much appreciated :-)

best, Jun