nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

Issue in Funannotate sort #988

Closed dkumarsh closed 4 months ago

dkumarsh commented 6 months ago

Hi Jon,

While I performed funannotate sort, the output file had the sorted contigs from longest to shortest, but renamed the contigs chronologically which lead to change the original contig number of the genome.

How can I retain the original contig numbers in the sorted and renamed fasta file.

Thank you in advance.

nextgenusfs commented 6 months ago

As I hope you could see in the help menu, this step isn't required but aims to simply the fasta headers so they don't cause problems with Augustus.

Usage:       funannotate sort <arguments>
version:     1.8.16

Description: This script sorts the input contigs by size (longest->shortest) and then relabels
             the contigs with a simple name (e.g. scaffold_1).  Augustus can have problems with
             some complicated contig names.  Alternatively pass -s,--simplify in order
             to split fasta headers at first space.
...

So the simple answer is just don't use funannotate sort. If you have long or complicated fasta headers it might be problematic.

dkumarsh commented 5 months ago

Hi, thanks for suggestion. I did do sorting and renaming separately :)

hyphaltip commented 4 months ago

Unclear what you would still like help on here?

sorting is intended to reorder by size and rename.

if you want to develop a custom sorting strategy without renaming this is quite straightforward to write a simple biopython script to.

dkumarsh commented 4 months ago

Yes, I did use python script. Thanks