phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
80 stars 18 forks source link

Question: barcodes for cells #26

Closed s2hui closed 3 years ago

s2hui commented 3 years ago

Hello, I would like to get the barcodes associated with the Conga clusters (in the image below, my understanding is that there are 41 cells and 26 clonotypes associated with the conga cluster 2/10). Is there an output file I could look at to get this info OR a place in a python data structure that points to this info? Thanks for your help, @s2hui

image

phbradley commented 3 years ago

Great question! There should be a tab-separated text file produced by the graph-vs-graph analysis named _graph_vs_graph.tsv. That file has columns gex_cluster, tcr_cluster, and clone_index (and others). You would look for the lines with 2 in the gex_cluster column and 10 in the tcr_cluster, and get the clone_index column for those. Those clone_index numbers will be the indices into the final anndata object (if you still have it in a jupyter notebook) or if you ran conga from the command line, into the saved anndata object and saved tab-separated text file called _final_obs.tsv. Looking at those (0-indexed) rows in either anndata (adata.obs) or in the final_obs.tsv spreadsheet will give you the barcodes for the representative cells, along with their TCR sequences. Then to get all the cell barcodes you could look for those TCR sequences in the input TCR information, for example the filtered_contigs file (or the conga clones file and barcode mapping files). As a cross-check, note that the graph_vs_graph.tsv file also has TCR amino acid information, so you can double-check you are getting the right cells. Let me know if that's not clear!

sschattgen commented 3 years ago

Here's a bit of code that does what Phil suggested above:

Open the AnnData object with the conga results

adata_file = 'your_path/some_prefix_conga.h5ad'
adata = sc.read_h5ad(adata_file)

Open the gvg results

gvg_hits_file = 'your_path/some_prefix_graph_vs_graph.tsv'
gvg_hits = pd.read_csv(gvg_hits_file, sep ='\t')

Clone_index in the gvg hits df matches the adata.obs index which contains the cell barcodes We can append these to the gvg results with this: gvg_hits['barcode'] = adata.obs.iloc[gvg_hits.clone_index,].index

Resave gvg_hits.to_csv(gvg_hits_file, sep ='\t', index = False )

s2hui commented 3 years ago

Hi, Thank you for your help! I ran the code provided by @sschattgen and with the following modifications it appears to have worked.

adata_file = 'your_path/some_prefix_final.h5ad'
adata = sc.read_h5ad(adata_file)

@s2hui

sschattgen commented 3 years ago

Sorry for the typo but glad it worked!