phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
79 stars 18 forks source link

Question about --nbr_fracs #64

Open anchormok opened 10 months ago

anchormok commented 10 months ago

Hi everyone, A wonderful package to link scRNA and scTCR!! Recently I test the run_conga.py script and found that CoNGA scores in TCR2D was easily influenced by argument 'nbr_fracs'. When I use the default option (0.01)(above), it seems that less association between gex and tcr. If I set to 0.5 (below), obvious association was present. And you emphasized the smalllish nbr_fracs in your script. So is there any suggestions in selecting the suitable value for nbr_fracs?

by the way, is the "length" in TCR.csv necessary for analyses? I have a TCR.csv file without "length" (It is not standard output from 10x). I want to include this file in CoNGA input.

Looking forward to your reply. Thanks!

merge_graph_vs_graph_logos test1_graph_vs_graph_logos

phbradley commented 9 months ago

Hi there,

Thanks for trying conga! This is a really interesting question. I can see that using larger neighbor-fractions for building the neighbor lists might lead to more significant P values in some cases, but I wonder whether one might lose some "specificity" and see more generic assocations, like the tendency for CD4+ (or CD8+) TCRs to be more similar to one another, or for memory TCRs to share sequence features, versus TCR/GEX associations in smaller groups of cells. I don't think there's any correct answer. Note that the default suggestion is to combine both 0.01 and 0.1 neighbor fractions.

Take care, Phil


From: anchormok @.> Sent: Monday, September 25, 2023 12:22 AM To: phbradley/conga @.> Cc: Subscribed @.***> Subject: [phbradley/conga] Question about --nbr_fracs (Issue #64)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi everyone, A wonderful package to link scRNA and scTCR!! Recently I test the run_conga.py script and found that CoNGA scores in TCR2D was easily influenced by argument 'nbr_fracs'. When I use the default option (0.01)(above), it seems that less association between gex and tcr. If I set to 0.5 (below), obvious association was present. And you emphasized the smalllish nbr_fracs in your script. So is there any suggestions in selecting the suitable value for nbr_fracs?

by the way, is the "length" in TCR.csv necessary for analyses? I have a TCR.csv file without "length" (It is not standard output from 10x). I want to include this file in CoNGA input.

Looking forward to your reply. Thanks!

[merge_graph_vs_graph_logos]https://urldefense.com/v3/__https://user-images.githubusercontent.com/40140273/270259679-4c3e684b-ea14-440d-9274-4307125c77c2.png__;!!GuAItXPztq0!mbI3vzLg3tBe4orrTMtgNPBrKUQB4xYO1jsODHYzd4ZQQrGXYSVorzxSiVkeLyPXoQKfQ1zi-eDxyPka_1AJ4tPW$ [test1_graph_vs_graph_logos]https://urldefense.com/v3/__https://user-images.githubusercontent.com/40140273/270259693-21355921-55ba-4e1e-9530-d2f7cf2f99f0.png__;!!GuAItXPztq0!mbI3vzLg3tBe4orrTMtgNPBrKUQB4xYO1jsODHYzd4ZQQrGXYSVorzxSiVkeLyPXoQKfQ1zi-eDxyPka_2tf-tfa$

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/phbradley/conga/issues/64__;!!GuAItXPztq0!mbI3vzLg3tBe4orrTMtgNPBrKUQB4xYO1jsODHYzd4ZQQrGXYSVorzxSiVkeLyPXoQKfQ1zi-eDxyPka_-DsSECW$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABBNCH6VMTLMOFUDQK5W7ODX4EWMDANCNFSM6AAAAAA5FT6LNA__;!!GuAItXPztq0!mbI3vzLg3tBe4orrTMtgNPBrKUQB4xYO1jsODHYzd4ZQQrGXYSVorzxSiVkeLyPXoQKfQ1zi-eDxyPka_9xnBLur$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

anchormok commented 9 months ago

Thanks for your sugguestions! After confirming the special cells cluster identified by CoNGA score, I used the Seurat to analyze their transcriptional characteristics and met some questions. For example, total 1400 interested cells with unique TCR sequence were identified by CoNGA (because CoNGA choose the representative cell for clone), but in my seurat object, 1585 interested cells with the share TCR sequence were found. Should I used the 1400 cells or 1585 cells for analyses? If the answer is 1585 cells, does it mean that the similar TCR sequence present the similar transcriptional characteristics?

Thanks again for your reply

anchormok commented 9 months ago

by the way, I found the Top 9 DE genes logo in "graph_vs_graph_logos.png" were too crowded to present. Is there some parameters to solve this item?? Thanks a lot! Snipaste_2023-10-08_16-50-40

phbradley commented 9 months ago

Re: 1485 vs 1500, it probably depends on your dataset, and there may not be much difference. Generally speaking, cells with the same clonotype do tend to have similar GEX profiles, which supports conga's reduction to clonotypes. But in a dynamic setting there can be substantial heterogeneity.

Regarding the logos, the genes themselves are probably getting written out somewhere to a TSV file. But the TCR logos also don't look good, so it might be worth fixing that and then seeing if the DEGs are more interpretable. I think the problem is with the conversion from SVG to PNG. If you attach the log file, maybe we could figure out which tool is being used and swap it out for something that performs better on your system. I find Inkscape to be the most reliable across linux and mac.

anchormok commented 9 months ago

An excellent suggestion!! The question about DEG and TCR logo presentation have been solved since I employed the "Inkscape" for the conversion from SVG to PNG. Thanks for your reply!