rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

CLUSTALW and MSF format #32

Closed ErisonChen closed 1 year ago

ErisonChen commented 2 years ago

hi @rcedgar with muscle v5. How could i output aligned file with the clustalw or GCG MSC format.

Thanks a lot

ErisonChen commented 2 years ago

And another questions: how could i manually set the cpu and cores the muslce used.

muscle 5.1.linux64 [] 65.6Gb RAM, 40 cores

this is prettty huge of 65.6Gb and 40 cores.

Thanks

rcedgar commented 2 years ago
  1. Muscle v5 only supports FASTA and EFA output formats, you need to use a third-party tool to convert formats.

  2. That line reports the amount of RAM and CPU cores available, not the number actually used, this will be shown later as the command progresses. You can use the -threads N option to set a maximum number of cores.

ErisonChen commented 2 years ago
  1. That line reports the amount of RAM and CPU cores available, not the number actually used, this will be shown later as the command progresses. You can use the -threads N option to set a maximum number of cores.

hi @rcedgar; How can we limit the maximum amount of RAM used. Today there was a program that spiked from 5G to 200G and then was killed and terminated. Our compute nodes can't handle such a high memory spike.

Thanks so much.

rcedgar commented 2 years ago

Currently, muscle v5 does not provide command line options to specify the maximum amount of RAM. If you are using -align, then -super5 generally uses less RAM. By trying some examples, you can estimate roughly how much RAM will be used by your input data as a function of sequence length and number of sequences, then you can filter out datasets which are too big. Most cluster job managers and cloud services can set a maximum for a process.