jwohlwend / boltz

Official repository for the Boltz-1 biomolecular interaction model
MIT License
1.19k stars 121 forks source link

Guideline for locally generating MSAs #48

Open sangyeon-hits opened 4 days ago

sangyeon-hits commented 4 days ago

Hello, I'd like to reproduce the model's performance while generating MSAs locally.

I've got colabfold_search and mmseqs2 as the search tools, and uniref30_2302 and colabfold_envdb_202108 as DBs following the white paper.

But I don't know the exact commands and workflow for generating .a3m files the same way the authors did.

In particular, as I understand, I need to pair MSAs of different chains using taxonomy. But I'm not sure the current code includes such because I expect that there should be a file containing the taxonomy annotations, which I don't see:

We then assign taxonomy labels to all UniRef sequences using the taxonomy annotation provided by UniProt.

32 asked a similar question in addition to the confidentiality of the mmseqs server, but that part seems to have been missed out.

sangyeon-hits commented 2 days ago

Related:

  1. https://github.com/jwohlwend/boltz/issues/4#issuecomment-2494067009
  2. https://github.com/jwohlwend/boltz/issues/4#issuecomment-2494979165