Closed yingchen69 closed 12 months ago
The novel SNP detection is automatically included in the workflow/wrapper, and the vcf file is the SNP detection results where the coordinate is the concatenated exons for each allele.
For the t1k-copynumber.py script, the "-g" option takes the the XXX_genotype.tsv file generated by T1K. You may also add the gene names to option "--nomissing" as a comma-separated list, where the genes are expected to be present on every chromosome. For example, we know there are four KIR framework genes, so the copy number inference command should be:
python3 t1k-copynumber.py -g XXX_genotype.tsv --nomissing KIR3DL3,KIR2DL4,KIR3DP1,KIR3DL2
Hope this helps and please let me know if you have further questions.
Hi,
Thanks a lot for the quick reply!
I am still a bit confused by the SNP detection. Here is my command to get the genotype.tsv:
run-t1k -1 my_R1.fq.gz -2 my_R2.fq.gz --preset hla -f /hlaidx/hlaidx_dna_seq.fa --od outputdir -o mysampleid
I do not see any vcf in the output folder. Should I set more parameter to get SNP result in the output?
Best,
Ying
Can you show me the output on the screen?
Another way to call the variants is through the "./analyzer" command, which can be run as: "./analyzer -f /hlaidx/hlaidx_dna_seq.fa -a outputdir/mysampleid_allele.tsv -1 outputdir/mysampleid_aligned_1.fa -2 outputdir/mysampleid_aligned_1.fa.gz -s 0.97 -o outputdir/mysampleid" in your case. After that you shall see the outputdir/mysampleid_allele.tsv file.
Hi,
Our server is currently down for monthly maintenance. I will get the list of outputs of my T1K jobs tomorrow once it's back online.
Thanks a lot for the help!
Ying
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Li Song @.> Sent: Wednesday, May 17, 2023 2:47:27 PM To: mourisl/T1K @.> Cc: yingchen69 @.>; Author @.> Subject: Re: [mourisl/T1K] How to run post analysis steps such as copynumber and novel SNP detection? (Issue #11)
Can you show me the output on the screen?
Another way to call the variants is through the "./analyzer" command, which can be run as: "./analyzer -f /hlaidx/hlaidx_dna_seq.fa -a outputdir/mysampleid_allele.tsv -1 outputdir/mysampleid_aligned_1.fa -2 outputdir/mysampleid_aligned_1.fa.gz -s 0.97 -o outputdir/mysampleid" in your case. After that you shall see the outputdir/mysampleid_allele.tsv file.
— Reply to this email directly, view it on GitHubhttps://github.com/mourisl/T1K/issues/11#issuecomment-1551892008, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQ767T7VEUDMO4FTKWOWO3XGUML7ANCNFSM6AAAAAAYESQUS4. You are receiving this because you authored the thread.Message ID: @.***>
Hi,
I just got our server back and I checked that we do have the allele.vcf files. Sorry for the confusion :(
I got t1k-copynumber.py working, but I am not sure about the column names. From the code it seems the column names should be gene, number of alleles, allele 1, allele 1 cn, allele 1 ratio, allele 2, allele 2 cn, allele 2 ratio. What is the 'ratio'? Is it in log2?
Another question, how does T1K deal with duplicated reads? My data are all from whole exon sequencing and I see ~20% duplicates rate by flagstat. I tried several tools for HLA typing such as polysolver, OptiType, hisat-genotype,HLAHD, HLALA. I always got totally different HLA typing results if I removed duplicated reads from the bam files first.
Thanks a lot!
Ying
Ratio is the log-ratio of the likelihood between the most likely copy number and the second likely copy number. I'm still trying to optimize t1k-copynumber.py, so please interpret its result with caution.
We don't remove the duplicated reads. The duplicated reads will contribute to the allele abundance estimation (or other type of allele score in other HLA genotypers), therefore it is expected that the deduplication will affect the genotyping results. Hope this helps.
Hi,
Thanks a lot for the details!
Regarding the duplicate reads, is their a reason to keep them for analysis? Normally we think duplicate reads are artifacts due to PCR, optical...
Best,
Ying
In RNA-seq data, I saw some genes are very highly expressed and some reads can become duplicated by chance. I think the deduplication should be left to the user when preprocessing the fastq file if they feel the duplication has become a severe issue.
Hi,
The readme mentions that t1k post analysis steps can do copynumber and novel SNP detection. Is there any detail regarding how to do the tasks? There is a t1k-copynumber.py, but I am not sure if python t1k-copynumber.py -g T1K_genotyping_result_file can work.
Thanks a lot for the help!
Ying