phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
79 stars 18 forks source link

Compare multiple sample groups #34

Open cr2106 opened 2 years ago

cr2106 commented 2 years ago

Hello,

I love CoNGA! Thank you for developing and making available to everyone such a great package!

I am exploring CoNGA with multiple samples, and I was wondering if it is possible to compare two or more groups (e.g., patients vs. control group).

Thank you!

phbradley commented 2 years ago

Hi there, Thanks for the positive feedback, and apologies for my slowness replying (travel). I added a new answer to the 'Frequently Asked Questions' section of the CoNGA README. Let me know if that answers your query. Thanks, Phil

cr2106 commented 2 years ago

Hi,

Thank you for your answer!

I manually added new columns in adata.obs containing other batch information (patient groups), as you suggested. Now, I would like to include the patient groups in the graph vs. graph analysis (and later in graph vs. feature) to compare the conga clusters in the different patient groups, but I'm not sure when/in which function I should include the batch argument (I'm using jupyter notebook).

phbradley commented 2 years ago

Right now the main thing is visualizing the batch composition of the conga clusters, tcr clumping groups, and g-v-f and hotspot clustermaps. If the list of adata.obs batch column names is saved in adata.uns['batch_keys'] at the start of the analysis, then hopefully the batch assignments will appear in the corresponding figures. So for starters, you could see whether there is batch annotation being added to those outputs. In the logo plots, it would appear just to the right of the dendrograms. It might be worth pulling the latest code from github because I added a few new features to the graphs (like labeling the batch assigments).

Let me know if that all works, and if there's more you want to do with the batch assignments we can talk about how to implement that. THere is some code still in development that might be of interest...

cr2106 commented 2 years ago

I added the batch column names in adata.obs and in adata.uns['batch_keys'] before computing GEX and TCR neighbor sets (If I do it before, I cannot reduce to a single cell per TCR). adata.uns['batch_keys'] is ['diagnostic_group', 'patient’]. When I make the logo plot, I get the error : KeyError: 'diagnostic_group' .

phbradley commented 2 years ago

Thanks for the update! The batch column names do need to be set in adata.uns['batch_keys'] before reducing to a single cell per TCR, so that the batch composition of clonotypes can be calculated (clonotypes span multiple cells so they can conceivably be in multiple batches). That's probably why the logo plot is failing. What error do you get if you try to reduce to a single cell per clone with them set? Note that the values for 'diagnostic_group' and 'patient' in adata.obs need to be integers...

cr2106 commented 2 years ago

They were not integers, I corrected this and it worked! I can see the patient groups next to the dendrograms, but I cannot say which color correspond to which group. Is there a way to add the legend similarly to the clusters?

phbradley commented 2 years ago

Glad it worked! Right now the batch colors are arranged in increasing order from bottom to top, 0 = blue at the bottom, then 1 above that, etc. The 'tab10' colormap is used if there are fewer than 11 batches, otherwise the 'tab20' colormap is used: https://matplotlib.org/stable/gallery/color/colormap_reference.html

Each batch_key (e.g. 'patient' or 'diagnostic_group' ) gets a bar that is divided left/right with the left, thicker part showing the batch composition of the corresponding cluster and the right, thinner part showing the batch composition of the full dataset (so you can see enrichment/depletion).

Yeah, I need to figure out how to squeeze a legend for the batch colors in there. And also allow non-integer batch ids...

Let me know if you have any questions!