Closed bienchen closed 2 years ago
Of course mmCIF lets us dump any old string data so we could certainly handle this in python-modelcif. Essentially it will just use Python's string representation of a list here, so your code looks OK to me (except for the f-string; that won't work in Python 2). I'd be reluctant to establish a precedent for doing that though since the format of these lists isn't defined in the dictionary. I think it would be better to either add support in the dictionary itself for such lists, or allow for something a bit more standardized, for example a JSON string. The latter would allow for more complex data structures too of course. Although my concern is, if you're trying to put that many parameters in the file, maybe an associated file might be a better choice. Hopefully @brindakv can opine once she's back in action.
BTW, I'm always wary of using str(x)
to output more complex Python data structures to files because
<ihm.System object at 0x1045e4e50>
which you can't read back in, of course.x = eval(file_contents)
which could result in executing arbitrary Python code.Maybe JSON as type in _ma_software_parameter
would be the better idea, here. Then an associated file makes sense.That means figuring out how to get this into the parameter list.
@bienchen could you provide an example of ColabFold's model_order
list or other software parameter lists? Although _ma_software_parameter.value
can accept a list, I don't think it was meant to be used that way. Associated file is an option. However, it is not straightforward to link it to the software parameter table. What would _ma_software_parameter.value
(mandatory) correspond to in this case?
Here is a complete config.json
from ColabFold:
{ "num_queries": 1, "use_templates": false, "use_amber": false, "msa_mode": "MMseqs2 (UniRef+Environmental)", "model_type": "AlphaFold2-multimer-v2", "num_models": 5, "num_recycles": 3, "model_order": [ 3, 4, 5, 1, 2 ], "keep_existing_results": true, "rank_by": "multimer", "pair_mode": "unpaired+paired", "host_url": "https://api.colabfold.com", "stop_at_score": 100, "recompile_padding": 1.1, "recompile_all_models": true, "commit": "b532e910b15434f707f0b7460abc25c70fcb9b26", "version": "1.2.0" }
Model order is a list of integers.
I think instead of an associated file, in this case having a data type json
would be the simplest solution.
@brindakv and I discussed this today and agreed to extend the ma_software_parameter.data_type
to include simple comma-separated lists of integers or floats, as others within PDB are reluctant to allow JSON in values.
@bienchen @benmwebb This has been addressed in the latest ModelCIF update.
Some modelling pipelines have lists as parameter values, like ColabFold's
model_order
. In_ma_software_parameter
, I think that should go to_ma_software_parameter.data_type
other
and then something like_ma_software_parameter.data_type_other_details
list of <integer|string|boolean|float>
. This could be automatised, I almost got there myself but the translation of boolean values fromTrue
toYES
failes, which seems to be at the core of python-ihm ... here is a code example (modelcif/dumper.py):Could that functionality be added or is it a better idea to let developers fill
_ma_software_parameter.data_type_other_details
themselves?