mourisl / T1K

T1K is a versatile methods to genotype highly polymorphic genes (e.g. KIR, HLA) with bulk or single-cell RNA-seq, WGS or WES data.
MIT License
42 stars 7 forks source link

Problems with copynumber analysis #28

Closed jshdydh123 closed 4 months ago

jshdydh123 commented 4 months ago

Hi,I wanted to express my appreciation for your incredible tool.

While utilizing the T1K-copynumber function, I encountered an issue. Specifically, when running the command python3 t1k-copynumber.py -g XXX_genotype.tsv --nomissing KIR3DL3,KIR2DL4,KIR3DP1,KIR3DL2, I observed that in the output file generated by copynumber, several framework genes, such as KIR2DL4, have a log-ratio of 0.00, resulting in a copynumber of 0.

However, upon examining the genotype.tsv file for these genes, I noticed that the abundance of KIR2DL4 is relatively high and the quality score is 60. I am curious about how to interpret this situation and whether there is a solution to address it. The sample genotype.tsv file and copynumber output file are shown in the figure.

I appreciate your time and assistance in resolving this matter.

huang kir_copynumber kir_gentype tsv

mourisl commented 4 months ago

The script finds there are two copies for 2DL4 "2DL4*005 2 0.00", the number after the allele (abundance in the original genotype.tsv file), is the number of copies. Though based on your screenshot of the abundance, I think the log-likelihood difference with the next likely copy number should be larger, at least shouldn't be 0.00. Could you please paste the genotype.tsv as a text and I will look into this issue. Thank you.

jshdydh123 commented 4 months ago

Thank you for your reply. This is the genotype.tsv file mentioned above. t1k_kir_genotype_tsv.txt

mourisl commented 4 months ago

Thank you for sharing the file. The "quality_score" column in the copy number output is the difference of the likelihood, and in this case both likelihood are very small, so the difference becomes less than 0.01. I have changed the value to the difference of log-likelihood (which I think I used this value at some point), so the difference is more obvious now.

jshdydh123 commented 4 months ago

Thank you very much for your assistance. With the adjustments made to the copynumber function, I have successfully resolved the issue mentioned earlier.