Closed tshalev closed 3 years ago
Hi! I'm getting the same error!
Adjusting transcripts:
Adjusting genes:
Adjusting chromosomes lengths:
Ranking exons: ....................................................................................................
10000 ....................................................................................................
20000 ....................................................................................................
30000 ............................................
Create UTRs from CDS (if needed):
Correcting exons based on frame information.
java.lang.RuntimeException: Error: Cannot find first coding exon for transcript:
NIGP01000374:-3367-38263, strand: -, id:AAEL023102-RA
5'UTR : NIGP01000374 38195-38263 UTR_5_PRIME 'UTR5_NIGP01000374_38196_38264'
Exons: NIGP01000374:-3367--3191 'EXON_NIGP01000374_38088_38264', rank: 2, frame: .,sequence: tcgcctacaatgctcaactagaaacaattactctaaggcgaaatccatctcacgttccaacctacgaaaatgcaattgaatggcacggtaacgatggctgcctcatctgaaccacccgagcctccacctcgcaatccggacaagatcaatgcatcactcaagcagctagccgaatcg
NIGP01000374:11027-11653 'EXON_NIGP01000374_11028_11654', rank: 1, frame: 0, sequence: aaaacccgttcgctggatacggccaccgataagacaaccgctccggccaccggtgcccgaccattccggcctatcctgtcgctggacaatgcaaagccattaacgaagccattcgaatcatctggaacgcccacgtcggcaccagcctcgtcgtttgccaacagtaacagtaacaacaataacaatggcagcagtcacaacagcagcatggaatcgaattcgaccagcacaaccgggggtccaaactcgggcaccggaaccagtggaagcagcatcagtagttccggtggaggcggaggtggtgacaatggccctgctgctgctgctgctgaactggtgagaggtggttcctcaggtagcggagtaagtccaccgggtgaaggcggtggaatagctggtcaaattggtaacaaattgaactccggtcaacagcagatctcgcccacgcagagtgaaaagagcagcacaggtgggagcaaggagcagtccggtgataattcgggcggcgataacctgttcaagaacggtgtgacagatctaggtgagtcgatagtattgttggtttatttggtaacatgtggaggtggagaattccgtatgaatatgattcatttttcatgatcgtaa
3'UTR : NIGP01000374 11027-11032 UTR_3_PRIME 'UTR3_NIGP01000374_11028_11033'
at org.snpeff.interval.Transcript.getFirstCodingExon(Transcript.java:1136)
at org.snpeff.interval.Transcript.frameCorrectionFirstCodingExon(Transcript.java:909)
at org.snpeff.interval.Transcript.frameCorrection(Transcript.java:878)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.frameCorrection(SnpEffPredictorFactory.java:596)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.finishUp(SnpEffPredictorFactory.java:545)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:348)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
java.lang.RuntimeException: Error reading
file'/home/group_AM/Venitha/installations/snpEff_latest_core/snpEff/./data/AaegL5/genes.gtf'
My solution was to not use SnpEff and use Variant Effect Predictor instead.
Hi there, I have soluted this issue. If we find this error, that means there are some genes in gtf file but not in fasta file. So we just have to remove this gene in gtf file. For example, sed -i "/ENSBGRT00000033763/d" genes.gtf
That works for my data. There is the bin file in my dataset folder.
Closing old issues.
I ran into similar problem and it was because my 5' UTR happened after start codon in one gene. FYI.
Hello,
I am trying to build a database for trees species (Western Redcedar). I have a draft genome and some annotations in GFF3 format. When I try to build the database I get the following error:
Adjusting transcripts: Adjusting genes: Adjusting chromosomes lengths: Ranking exons: .................................................................................................... 10000 .................................................................................................... 20000 .................................................................................................... 30000 .................................................................................................... 40000 .................................................................................................... 50000 ............................................................ Create UTRs from CDS (if needed): Correcting exons based on frame information. ....java.lang.RuntimeException: Error: Cannot find first coding exon for transcript: 29184128:-672-2175, strand: -, id:PAC4GC:47054313, bioType:protein_coding, Protein 5'UTR : 29184128 2067-2175 UTR_5_PRIME 'PAC4GC:47054313.five_prime_UTR.1' Exons: 29184128:-672--546 'PAC4GC:47054313.exon.2', rank: 3, frame: 2, sequence: cttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact 29184128:-200--7 'PAC4GC:47054313.exon.1', rank: 2, frame: ., sequence: tactagtgtaaccctcataatttgcaggctcttctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatg 29184128:37-112 'PAC4GC:47054313.exon.3', rank: 1, frame: 1, sequence: aaaattatcaagcgtggggcttaagggagctctctcaaataaaattggttctctgacagcacttcatactctgtaa CDS : ctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatgcttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact Protein : LFLQFPLLLFELLTYFGMTVQIEYEDMFWWVMLDFSFHGFPLLWSHKQRCFYPESDELAVGKYSPNKLEQWYRSL*LSLGWGELHKWPHNVT
java.lang.RuntimeException: Error reading file '/mnt/e/tal/Documents/UBC/GSAT/PhD/WRC/GS/wrc/snps/S_lines/filtering_for_pop_gen/new_analysis/snpEff/./data/tpli_3.1/genes.gff' java.lang.RuntimeException: Error: Cannot find first coding exon for transcript: 29184128:-672-2175, strand: -, id:PAC4GC:47054313, bioType:protein_coding, Protein 5'UTR : 29184128 2067-2175 UTR_5_PRIME 'PAC4GC:47054313.five_prime_UTR.1' Exons: 29184128:-672--546 'PAC4GC:47054313.exon.2', rank: 3, frame: 2, sequence: cttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact 29184128:-200--7 'PAC4GC:47054313.exon.1', rank: 2, frame: ., sequence: tactagtgtaaccctcataatttgcaggctcttctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatg 29184128:37-112 'PAC4GC:47054313.exon.3', rank: 1, frame: 1, sequence: aaaattatcaagcgtggggcttaagggagctctctcaaataaaattggttctctgacagcacttcatactctgtaa CDS : ctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatgcttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact Protein : LFLQFPLLLFELLTYFGMTVQIEYEDMFWWVMLDFSFHGFPLLWSHKQRCFYPESDELAVGKYSPNKLEQWYRSL*LSLGWGELHKWPHNVT
00:22:17 Logging 00:22:18 Checking for updates...
When I try deleting the offending sequence from the gff file it just finds an issue with another one. For reference, the gff file looks like this on this sequence:
gff-version 3
annot-version v3.1
species Thuja plicata
29184128 JGI_gene mRNA 38 2176 . - . ID=PAC4GC:47054313;Name=Thpliv31003279m;longest=1;Parent=Thpliv31003279m.g 29184128 JGI_gene exon 1983 2176 . - . ID=PAC4GC:47054313.exon.1;Parent=PAC4GC:47054313 29184128 JGI_gene CDS 1983 2067 . - 0 ID=PAC4GC:47054313.CDS.1;Parent=PAC4GC:47054313 29184128 JGI_gene five_prime_UTR 2068 2176 . - . ID=PAC4GC:47054313.five_prime_UTR.1;Parent=PAC4GC:47054313 29184128 JGI_gene exon 1511 1637 . - . ID=PAC4GC:47054313.exon.2;Parent=PAC4GC:47054313 29184128 JGI_gene CDS 1511 1637 . - 2 ID=PAC4GC:47054313.CDS.2;Parent=PAC4GC:47054313 29184128 JGI_gene exon 38 113 . - . ID=PAC4GC:47054313.exon.3;Parent=PAC4GC:47054313 29184128 JGI_gene CDS 38 113 . - 1 ID=PAC4GC:47054313.CDS.3;Parent=PAC4GC:47054313
Sorry if this is kind of messy, I couldn't figure out how to make the table look better here.