ihmwg / ModelCIF

mmCIF-based extension dictionary for computed structure models
Creative Commons Zero v1.0 Universal
17 stars 4 forks source link

Extension of ModelCIF for AF3 quality estimates #21

Open gtauriello opened 3 weeks ago

gtauriello commented 3 weeks ago

Related to #20 and the issues mentioned in there, I would suggest to extend ModelCIF to capture all new types of quality estimates introduced with AlphaFold 3 (AF3). I also had a look at RoseTTAFold-AllAtom and the suggestions below would also capture anything needed there. I also believe that this should cover anything needed for https://github.com/chaidiscovery/chai-lab/issues/52. Here is my suggested additions:

  1. Extend _ma_qa_metric.type to include:
    • "pLDDT to polymer" with detailed description "confidence score predicting accuracy according to lDDT with distances from each atom to CA or C1' of nearby polymer residues in [0,100]"
    • "boolean" with detailed description "0 or 1 depending on whether a check passed (1) or not (0)."
  2. Extend _ma_qa_metric.mode to include "per-chain", "per-chain-pairwise", "per-atom" and "per-atom-pairwise" (and yes I know it's a bit unfortunate that we used "local" for "per-residue" but ok...)
  3. New _ma_qa_metric_per_chain same as _ma_qa_metric_local but without label_comp_id and label_seq_id
  4. New _ma_qa_metric_per_chain_pairwise same as _ma_qa_metric_local_pairwise but without label_comp_id* and label_seq_id*
  5. New _ma_qa_metric_per_atom same as _ma_qa_metric_local but using atom_id (linked to _atom_site.id) instead of model_id and label_*
  6. New _ma_qa_metric_per_atom_pairwise same as _ma_qa_metric_local_pairwise but but using atom_id_1 and atom_id_2 (linked to _atom_site.id) instead of model_id and label_*

Concretely for AF3 output (e.g. looking at the JSON files in one of their examples) here is how each of the scores would map to a _ma_qa_metric.mode and .type:

Some caveats to consider:

Alternative to the above (which simplifies some things and handles the per token scores):

@brindakv what are your thoughts on this?

gtauriello commented 11 hours ago

Notes from discussions with @benmwebb , @brindakv and @aozalevsky (on Oct. 16):

Example AF3 output (cut to only include one model instead of 5): fold_test_fold_job_number_one_cut.zip. Info on content:

Suggested ModelCIF extension: