Closed fernandogs97BR closed 6 days ago
Could you further explain how the calculation of the ratio in t1k-copynumber works? I can see that value would be very helpful for me to discriminate between false positives and true positives in KIR2DL5. PD:By the way, congratulations for publishing the pipeline in Genome Biology!
By the way, is possible to extract the KIR2DL5 A/B reference reads that match with my sequenced reads?
Could you further explain how the calculation of the ratio in t1k-copynumber works? I can see that value would be very helpful for me to discriminate between false positives and true positives in KIR2DL5. PD:By the way, congratulations for publishing the pipeline in Genome Biology!
Thank you. For the copynumber script, it applies a square-root transform of the abundance values (FPK), and then fit a normal distribution to model the single-copy allele distributions. Since the normal distribution is additive, we can use the parameter from the single-copy allele to calculate the distribution for two-copy, three-copy,... until ten-copy. We can calculate 10 likelihood values from each copy number distribution for an allele's abundance. The log-likelihood ratio is based on the best likelihood value and the second best likelihood value.
By the way, is possible to extract the KIR2DL5 A/B reference reads that match with my sequenced reads?
Do you mean you want to know which reads are assigned to 2DL5?
Thank you very much for resolving the first question! Regarding the second one, yes, I would like to know which reference sequences my reads align to for KIR2DL5, and what these reads are. I believe the issue I'm having with false positives for KIR2DL5 is the generation of nonspecific reads in my sequencing. Therefore, I want to compare the regions of the reference sequences to which reads from truly positive and negative samples for KIR2DL5 align, and be able to modify the reference based on this.
I just added the option "--outputReadAssignment" to the github repo, which will output the allele assignment to the {prefix}_assign.tsv file. Each row is one assignment, with the format of read_id allele_id allele_start allele_end. Will this help?
Thank you very much, I will try this option now! Will keep informed
We don't remove the duplicated reads. The duplicated reads will contribute to the allele abundance estimation (or other type of allele score in other HLA genotypers), therefore it is expected that the deduplication will affect the genotyping results. Hope this helps.
Originally posted by @mourisl in https://github.com/mourisl/T1K/issues/11#issuecomment-1552364903