Closed xiucz closed 4 years ago
Looking to the results of conversion LiftOver was not able to convert the hg38 coordinate 14:106719377 into hg19 coordinate. I guess that probably in this case LiftOver returns several hg19 coordinates for 4:190175393 and FusionCatcher just picks up one randomly.
Hi,@ndaniel Thank you for your quick response, the length of chr14(hg19) is 107,349,540. I can accept your explanations for 4:190175393. But I cannot understand why LiftOver was not able to convert the hg38 coordinate into hg19 coordinate. Actually, the UCSC returns:
Did I miss something? Can you explain more ?
Thank you very much.
xiucz
I think that the answer is because FusionCatcher is using the executable LiftOver which is a different version than the one from the UCSC Genome Web Browser.
Also the conversion in FusionCatcher is done one coordinate at the time (and not using intervals), which means something like:
liftOver chr4:190175393
liftOver chr14:106719377
and NOT
liftOver chr4:190175393-106719377
@ndaniel Thank you, this may be the possible solution.
Hi, @ndaniel
One more case,
The CLTC ref sequence "CTCTTCCTATGTTTTTGTTTTTTTTTGTTTTTTTTTTGTTTGTTTGTTTG" is consistent with hg38 systerm http://genome.ucsc.edu/cgi-bin/das/hg38/dna?segment=chr17:59644362,59644562, not hg19 http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=17:59643562,59644562.
So I think this may be a bug.
grep CLTC fusioncatcher/final-list_candidate-fusion-genes.txt fusioncatcher/final-list_candidate-fusion-genes.hg19.txt |grep ALK
fusioncatcher/final-list_candidate-fusion-genes.txt:CLTC ALK known,oncogene,cosmic,chimerdb2,cgp,ticdb,fragments,chimerdb3kb,chimerdb3pub,cancer,tumor 0 4 2 21 BOWTIE+STAR 17:59644562:+ 2:29209551:- ENSG00000141367 ENSG00000171094 CTCTTCCTATGTTTTTGTTTTTTTTTGTTTTTTTTTTGTTTGTTTGTTTG*TTTTTTTTTGAGACGGAGTTTCGCTCTTGTTGCCCAGGCTGGAGTGCCAT intronic/intronic
fusioncatcher/final-list_candidate-fusion-genes.hg19.txt:CLTC ALK known,oncogene,cosmic,chimerdb2,cgp,ticdb,fragments,chimerdb3kb,chimerdb3pub,cancer,tumor 0 4 2 21 BOWTIE+STAR 17:57721923:+ 2:29432417:- ENSG00000141367 ENSG00000171094 CTCTTCCTATGTTTTTGTTTTTTTTTGTTTTTTTTTTGTTTGTTTGTTTG*TTTTTTTTTGAGACGGAGTTTCGCTCTTGTTGCCCAGGCTGGAGTGCCAT intronic/intronic
Hi @xiucz
maybe BUT I think that there is also another bug in FC v1.10 (or lower version) which is that the reads with lower entropy region are not detected very well. This has been fixed in v1.20.
To me it looks like fusion junctions with very low entropy sequences like TTTTTGTTTTTTTTTGTTTTTTTTTTGTTTGTTTGTTTG*TTTTTTTTT
and *NNNNNNNNNNNNNNNNNNNNNGCCCCCCCCCCCCCCGCC
are very likely False Positive fusions. Also this kind of sequences are tricky to align on genome.
Thank you for your advice, and I will try the lastest version.
Hi, @ndaniel
I find the conversion from hg38 to hg19 is a little different from the result from https://genome.ucsc.edu/cgi-bin/hgLiftOver. Please see the example.
The DUX4's hg38 coordinate is 4:190175393, however, the lift result is a little strange. And I use UCSC LiftOver tool to convert IGH@'s hg38 coordinate(14:106719377) to hg19, it returns chr14:107174624-107174624. Although, I know the breakpoints of IGH are massive.
How did fusioncatcher deal with the conversion? Need I recheck the hg19.txt file?
Thank you very much.