soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
515 stars 128 forks source link

reformat.pl puts #=GC at top, should be at bottom #297

Open dstern opened 2 years ago

dstern commented 2 years ago

:exclamation: Make to check out our User Guide.

Expected Behavior

=GC line is recommended at bottom of alignment (https://en.wikipedia.org/wiki/Stockholm_format##=GC)

Does not seem like big deal, but causes alphafold2 to crash, because af2 expects #=GC at bottom (in their module remove_empty_columns_from_stockholm_msa - https://github.com/deepmind/alphafold/blob/0be2b30b98f0da7aecb973bde04758fae67eb913/alphafold/data/parsers.py) This is a problem when using custom multiple sequence alignments as input.

Current Behavior

in conversion to Stockholm format, #=GC RF placed at top of alignment

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps.

reformat.pl a3m sto file.a3m file.sto

HH-suite Output (for bugs)

Please make sure to post the complete output of the tool you called. Please use gist.github.com.

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the issue in.