inab / trimal

A tool for automated alignment trimming in large-scale phylogenetic analyses. Development version: 2.0
https://trimal.readthedocs.io/
GNU General Public License v3.0
175 stars 41 forks source link

remove columns with gaps in first line? #61

Closed dstern closed 9 months ago

dstern commented 3 years ago

I am wondering if trimaI can remove columns where there is a gap only in the first sequence of an alignment. This feature would be useful to simplify use of user-defined multiple sequence alignments in alphafold2, which requires the first sequence to have no gaps.

nicodr97 commented 1 year ago

Hi @dstern , There is no feature currrently that can do what you suggest, although this could be achieved by the following workaround:

  1. Use the -complementary -colnumbering arguments only on the first sequence, either by separating it in a new file trimal -in first_seq.fasta -nogaps -complementary -colnumbering or by using trimal -in msa.fasta -selectseqs { 1-numseqs } -complementary -colnumbering where numseqs is the total number os sequences minus 1. You will get a list with the positions of the columns that are gaps in the first sequence (after #ColumnsMap).
  2. Use this list to pass to -selectcols { list_of_gap_positions } in this way trimal -in msa.fasta -selectcols { list_of_gap_positions }. You will have to remove the whitespace between commas and column numbers of the output of previous step.

I hope this helps!