ihmwg / python-modelcif

Python package for handling ModelCIF mmCIF and BinaryCIF files
MIT License
10 stars 1 forks source link

Add mechanism to read in associated files #10

Closed benmwebb closed 1 month ago

benmwebb commented 2 years ago

We can now (#3) write out associated files, including adding selected categories to external mmCIF. However, while we can read back in the list of associated files, we cannot yet read in any external mmCIF. Add a mechanism to do this (perhaps a download method for each file, plus an extract for any archives, plus a read method, or perhaps read would transparently download+extract). This would also require some refactoring of modelcif.reader.read() as it currently assumes that all reading is done there - there is no way to add extra mmCIF to an already-read System.

This would be needed to, for example, show pairwise QA scores for AlphaFold models.

gtauriello commented 1 month ago

Related to this: in https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/thread/M2NM6E6W4RHUV5SLJANXWE4OEAE5QFLW/ I listed a number of examples for valid ways to store pairwise QA scores (either in the file itself, in an associated file or within an associated zip file).

benmwebb commented 1 month ago

Alternatively, one could first read in both the original file and the associated file. This will result in two System objects (assuming a single data block in both files). The two can then be combined after the fact using logic similar to that in python-ihm's make_mmcif - i.e. rewrite any Entity in the associated file to point to the original file.

benmwebb commented 1 month ago

There is very preliminary support for this now in db1962f (also requires latest python-ihm, containing ihmwg/python-ihm@5c70daf). Here's a simple example of reading ModelArchive ma-bak-cepc-0944 and its associated file:

import modelcif.reader

with open('ma-bak-cepc-0944.cif') as fh:
    s, = modelcif.reader.read(fh)
with open('ma-bak-cepc-0944_predicted_aligned_error_v1.cif') as fh:
    modelcif.reader.read(fh, add_to_system=s)

m = s.model_groups[0][0]
print(len(m.qa_metrics))
print(m.qa_metrics[-1])

Of course, one still needs to download and unzip the associated file.

gtauriello commented 1 month ago

Looks great. Thanks.