ihmwg / python-modelcif

Python package for handling ModelCIF mmCIF and BinaryCIF files
MIT License
10 stars 1 forks source link

LocalPairwiseQAScoresFile within ZipFile not generating QA file #26

Closed gtauriello closed 2 years ago

gtauriello commented 2 years ago

@benmwebb it is awesome that one can add a modelcif.associated.LocalPairwiseQAScoresFile into an modelcif.associated.Repository and the dumper just magically splits the cif file and generates the desired QA file. One issue I observed now though is that this writing doesn't happen if there is a ZipFile in between (which unfortunately is the usecase we have in ModelArchive where all associated files are in a package).

So essentially if you have a LocalPairwiseQAScoresFile qa_fileand you add it directly into a Repository as in the code below it works as intended and splits up the cif file into two files (output.cif and output_qa.cif)

system.repositories.append(modelcif.associated.Repository("", [qa_file]))

The code below on the other hand doesn't generate the two files due to the ZipFile in between

system.repositories.append(modelcif.associated.Repository(
    "",
    [
        modelcif.associated.ZipFile("output.gz",
                                    files=[qa_file])
    ],
))

I modified the mkmodbase example to make a showcase of it which is attached here: mkmodbase-zip.py.zip At line 216 there you can pick whether to add the into a directly or have a ZipFile in between.

Am I doing something wrong or is there a bug somewhere?

benmwebb commented 2 years ago

Right, the magic only happens for associated CIFFiles at the top level. The assumption is if you specify a zip file that you've made that zip file yourself and you don't want python-modelcif to modify or overwrite it. But this could be certainly be changed for CIFFiles inside ZipFile to

  1. warn that selected categories won't be written out to such files; or
  2. dump them out to disk anyway, then you can add a postprocessing step if you like to put them in a zip file (perhaps combining them with other outputs not from python-modelcif); or
  3. dump them out to a temporary file and then automatically create the zip file containing the QA scores

(1) probably makes little sense since you've already specifically requested via categories or copy_categories to write out a file. (2) would be easy to do. (3) would be a bit more involved and I'm a bit reluctant to have too much happen "magically", but we could perhaps make that configurable.

gtauriello commented 2 years ago

I like option (2). That's what would have been my expected behaviour.

(3) sounds nice as well but it wouldn't really remove much complexity for the user. It's trivial enough to take a few files and package them into a zip-file instead of the library having to make assumptions and guesses on the desired behaviour.