naturalis / bio-cipres

Phylogenomic analysis on the CIPRES REST portal
MIT License
3 stars 4 forks source link

Sequence alignment with MAFFT or Muscle #3

Closed rvosa closed 4 years ago

rvosa commented 4 years ago

The experiences detailed here (https://github.com/nextstrain/ncov/pull/268) show that doing the MSA in one big run eventually becomes prohibitive. This was not a problem for the 400 GenBank genomes set, but as those submissions are increasing (or when we add GISAID data) it becomes an issue.

MAFFT has the virtue of being the standard that is now being used (e.g. by Rambaut et al.) but it might be slower than Muscle (@rvosa's subjective experience)? Both can be run on the CIPRES cluster. Test and decide.

rvosa commented 4 years ago
cipresrun \
     -y data/cipres_appinfo.yml \
     -t MAFFT_XSEDE \
     -p vparam.anysymbol_=1 \
     -i data/genomes/sars-cov-2.fasta \
     -o data/genomes/output.mafft