jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

write_fasta_from_gff.py error #44

Closed ncpalmateer closed 7 years ago

ncpalmateer commented 7 years ago

When running write_fasta_from_gff.py, I'm getting an error and the output file contains only a small portion of the number of proteins that should be present (protein count varies every time).

The command I'm using is: python ~/git/biocode/gff/write_fasta_from_gff.py -i BV115/BV115.gff3 -f BV115/BV115.fasta -o test.txt

cwd: /local/scratch/ncpalmateer/silva_lab/p67

Error message: Traceback (most recent call last): File "/home/Nicholas.Palmateer/git/biocode/gff/write_fasta_from_gff.py", line 126, in main() File "/home/Nicholas.Palmateer/git/biocode/gff/write_fasta_from_gff.py", line 87, in main coding_seq = feat.get_CDS_residues(for_translation=True) File "/home/Nicholas.Palmateer/git/biocode/lib/biocode/things.py", line 1093, in get_CDS_residues chop = sorted_cds[0].phase IndexError: list index out of range

path to checkout: /home/Nicholas.Palmateer/git/biocode $PYTHONPATH in .bashrc: /home/Nicholas.Palmateer/git/biocode/lib

jorvis commented 7 years ago

Is it possible for you to post your GFF anywhere? This is usually due to the GFF not following the specification (which also supports our need for a proper validator in biocode, already requested in issue #17)

ncpalmateer commented 7 years ago

The paths to the input files are: /local/scratch/ncpalmateer/silva_lab/p67/BV115/BV115.gff3 /local/scratch/ncpalmateer/silva_lab/p67/BV115/BV115.fasta

Also, the gff file was made from GMAP.

jorvis commented 7 years ago

This is caused by features in your input file which have an mRNA but no associated CDS. I've commited a fix for this in commit 9870938 but it will still write empty-length entries in FASTA (no good) until I also patch the write_fasta_from_gff3.py script. Looking at that now.

jorvis commented 7 years ago

Example of one of the problematic features: F42B2A9795D365B6F68334D3D403D376.path1

jorvis commented 7 years ago

As noted in commit ebb1688, the export script now prints a warning and skips any features which have 0-length.