rmhubley / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
214 stars 48 forks source link

Difference in chromosome naming between GTF and FASTA file #176

Closed mars188 closed 1 year ago

mars188 commented 1 year ago

Dear, I obtained the gff file by running repeatmasker with -gff flag. Then I converted this gff2 into gff3 with rmOutToGFF3.pl script. Next, I converted this gff3 file into GTF file with AGAT tool.

Next I used this GTF file for alignment with STAR but I get the following error:

Fatal INPUT FILE error, no exon lines in the GTF file: datepalm_refGene.gtf Solution: check the formatting of the GTF file, it must contain some lines with exon in the 3rd column. Make sure the GTF file is unzipped. If exons are marked with a different word, use --sjdbGTFfeatureExon .

My GTF file actually contained only gene/transcript in the 3rd column. So, I replaced the "transcript" with "exon" in the 3rd column of this GTF file and ran the STAR alignment again. But this time I got a different error shown below:

Fatal INPUT FILE error, no valid exon lines in the GTF file: datepalm_refGene.gtf Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.

So it seems like there is difference between FASTA and GFT chromosome names. Actually, my GTF file has no chromosome names at all. First column just contains numbering on each row (11, 11, 13, 13, 13, 14, 14 .............. and so on). I used the same FASTA file to generate GTF by following repleatmodeler and repleatmasker steps.

Any idea how it can be fixed?

Many thanks in advance!

rmhubley commented 1 year ago

Could you let me know what version of RepeatMasker you used the rmOutToGFF3.pl from?

rmhubley commented 1 year ago

I am closing this for now. Please let me know if this is still a problem and I will reopen it.