Open Huangyizhong opened 2 years ago
This is usually caused by read alignments spanning/bridging the two genes when they are very close to each other, and there is currently no easy solution for that - if the "evidence" in the read alignments data points to that. What organism is this? It's also useful to look at the read alignments track in IGV, check if there are a lot of reads spanning that intergenic space etc.
Those genes seem to be very close to each other (hard to tell without seeing the annotation track), it's not clear if the "fusion" happens due to the terminal exons overlapping (TSS of one gene too close to TES of the other, or post-TES polymerase run-through?), or due to spurious (spliced) read alignments creating false "junctions" linking the two genes.
A script could be devised to split such "chimeric" transcripts but that would be a band-aid solution covering for a possibly deeper issue -- it would be interesting to look closer at WHY that really happens when it does -- there could be situations where such "fusion transcripts" across neighboring genes might be "real" and not just alignment artifacts (e.g. in case of genes sharing a transcriptional unit, i.e. polycistronic transcription which has been shown to be possible in eukaryotes as well, not just in bacterial operons).
Thanks for your quick reply. Agree with you. I have checked the IGV with the RNA-seq mapped data ,as showed below. As there were so many mapped read, I just showed parts of alignments. What's your suggestion about it ? Thanks again for your kind help!
Hi, I had the same issue; I tried to decrease the maximum intron length in the alignment which solved the problem ~ so far!
Hi, I had the same issue; I tried to decrease the maximum intron length in the alignment which solved the problem ~ so far!
Sounds great! How to set the parameter to do it and have you solve this problems? I used the exons number (below 7) in the UTR region to filter the transcripts. I also check it in the IGV, almost all the fusion transcripts can be identified.
Hi, I had the same issue; I tried to decrease the maximum intron length in the alignment which solved the problem ~ so far!
Sounds great! How to set the parameter to do it and have you solve this problems? I used the exons number (below 7) in the UTR region to filter the transcripts. I also check it in the IGV, almost all the fusion transcripts can be identified.
I solved the problem by reducing the maximum intron length in the alignment step, not the assembly. Check your aligner documentation and change this parameter.
Thanks so much. I used the hisat2. To align the illumina paired data. And I check the --max-intronlen for it . But how to set this parameters, I confused . Thanks so much.
2022年3月11日 02:09,AmrSaadeldin @.***> 写道:
maximum intron length
Hi, there! I have used the stringtie2 to the genome-based transcripts assembly. I used the hisat2 to do the alignment of the RNA-seq data , and then the picard the remove the PCR errors. Finally, used the stringtie2 to assembl the transcripts. I finally used the IGV to check some transcripts. There are some transcripts that are merged from two nearby genes, as showed in the following picture. Is there some parameters that can be used to filter them? or some scripts? Need help! Thanks so much! Yizhong Huang