Closed s2hui closed 3 years ago
Hi,
It's important to remember that CoNGA is operating on the level of clonotypes, not cells, and that each clonotype is represented by one cell even if the clone size is not equal to 1. This means each row in the results table is a clonotype so the number of rows will not equal the number of cells, which would be the sum of the clone sizes for all clonotypes in the cluster. Clone size information is stored in adata.obs. You could count the number of clonotypes and cells in a conga cluster using something like this.
import scanpy as sc
import pandas as pd
import numpy as np
adata = sc.read_h5ad('vdj_v1_hs_pbmc2_5gex_protein_final.h5ad')
gvg_hits = pd.read_csv('vdj_v1_hs_pbmc2_5gex_protein_test_graph_vs_graph.tsv', sep = '\t')
gvg_7_6 = gvg_hits[(gvg_hits.gex_cluster == 7) & (gvg_hits.tcr_cluster == 6) & (gvg_hits.nbr_frac == 0.1) ]
clonotypes = np.unique(gvg_7_6.clone_index)
print(clonotypes.shape[0])
gvg_7_6_obs = adata.obs.iloc[clonotypes]
cells = np.sum(gvg_7_6_obs.clone_sizes)
print(cells)
Hi,
Thanks for the detailed steps. I still cannot get the number of cells to match what is in the figure for some reason. The nbr_frac = 0.01 which I double checked in the log.txt and summary.html files.
I loaded the h5ad file and the gvg file.
The ran the following:
>>> g = gvg_hits[(gvg_hits.gex_cluster == 5) & (gvg_hits.tcr_cluster == 2) & (gvg_hits.nbr_frac == 0.01) ]
>>> clonotypes = np.unique(g.clone_index)
>>> print(clonotypes.shape[0])
11
But the number of clonotypes should be 25 according to the gvg logos.png file.
Just out of curiosity, I tried nbr_frac = 0.1 and get the following:
>>> g = gvg_hits[(gvg_hits.gex_cluster == 5) & (gvg_hits.tcr_cluster == 2) & (gvg_hits.nbr_frac == 0.1) ]
>>> clonotypes = np.unique(g.clone_index)
>>> print(clonotypes.shape[0])
22
Strangely, if I try another conga cluster, but use nbr_frac = 0.1, I am able to get the correct number of clonotypes for some of the clusters but most do not match. The number of cells also does not match.
I double checked that the final.h5ad object is the in the same directory as the gvg_logos.png file and gvg hits file that I am using.
Thanks for your help, @s2hui
Hi there, thanks for using CoNGA! The conga hits include results from both 0.01 and 0.1 nbr_frac analyses. So it should be the case that this would give the expected number of clonotypes:
>>> g = gvg_hits[(gvg_hits.gex_cluster == 5) & (gvg_hits.tcr_cluster == 2)]
>>> clonotypes = np.unique(g.clone_index)
>>> print(clonotypes.shape[0])
Let me know if that's not the case!
For the cells, the prediction is that if you sum the clone sizes for those clonotypes, you get the cell number (e.g. using the clone_sizes
column in the *_final_obs.tsv
output or the adata.obs.clone_sizes array). If you want the cell barcodes, you could look for the corresponding tcrs in the 10x filtered_contigs file, or use the *_clones.tsv.barcode_mapping.tsv
file to map from clone_id to barcodes. We're planning on adding another output to make it easier to trace back to cell barcodes; will update the README when that's done.
Hi That worked! I executed the following code:
>>> import scanpy as sc
>>> import pandas as pd
>>> import numpy as np
>>> adata = sc.read_h5ad('out_final.h5ad')
>>> gvg_hits = pd.read_csv('out_graph_vs_graph.tsv',sep = '\t')
>>> g = gvg_hits[(gvg_hits.gex_cluster == 5) & (gvg_hits.tcr_cluster == 2)]
>>> clonotypes = np.unique(g.clone_index)
>>> print(clonotypes.shape[0])
25
>>> gg = adata.obs.iloc[clonotypes]
>>> cells = np.sum(gg.clone_sizes)
>>> cells
52
Thanks for your help! @s2hui
Hi phbradley,
May I ask that do you have any updates about this?
If you want the cell barcodes, you could look for the corresponding tcrs in the 10x filtered_contigs file, or use the *_clones.tsv.barcode_mapping.tsv file to map from clone_id to barcodes. We're planning on adding another output to make it easier to trace back to cell barcodes; will update the README when that's done.
Sorry if it was a simple question. I am not very familiar with python so I prefer to convert the output h5ad file and do the analysis in R. The problem is that it looks like there is no clone_id
in the meta data (obs). So how can I extract the cellbarcode information from *clones.tsv.barcode_mapping.tsv
file? I found one very interesting TCR cluster and would like to check all the cells in my scRNA-seq data. Thanks a lot for your help!
Best, Alice
Hello, After running conga, I have the following conga cluster:
When I look at the file merged_remedy_graph_vs_graph.tsv, and I look at only the subset of the rows where gex_cluster = 5 and tcr_cluster = 2, I am able to count the proper number of unique clonotypes (=25). I thought that the number of rows in this subset should equal the number of cells (=52), however I only count 44 rows.
I did the same for other conga clusters and while the number of clonotypes matches what is on the figure, I can't get the number of cells to match? I wonder where my error might be?
Thanks for your help, @s2hui