Closed danpf closed 6 years ago
For many use cases I agree with the possibility of a generic string field. Sounds light-weight and generic enough.
For quantities attached to residues and atoms on the other hand (e.g. model quality numbers), it might be nicer to have a standardized way to attach a list of numbers into the mmtf file so that any viewer could color the structure according to one of those quantities...
That would be nice too...
I guess 3 quick ideas:
raw-string-json
. let application handle json parsingdictionary
of strings
. let application handle going from string
to int
/double
msgpack
, let user handle msgpack obj
decoding.option 3 gets a little complicated with statically typed languages, but is probably the better option
Some keys could be standardized keys like color
or atom_color
or residue_color
for molecular viewers? should probably ask a few mol-viewer people their thoughts on that.
+1 for option 3
+1 for standardized keys like atomColorList
- also chargeList
(or partialChargeList
) and radiusList
to replace formats like PQR
A convention for non-standard keys would also be useful, this could prevent name clashes with future standard keys. E.g. if standard keys never use underscores, then an <appname>_
or <organization>_
prefix for custom keys could never lead to a naming conflict.
speaking of custom keys: PyMOL 2.1 exports MMTF files with two custom keys: pymolRepsList
(encoded with strategy type 7) and pymolColorList
(plain msgpack array).
speaking of custom keys: PyMOL 2.1 exports MMTF files with two custom keys: pymolRepsList (encoded with strategy type 7) and pymolColorList (plain msgpack array).
Perfect, now I know someone else would use this :p
A convention for non-standard keys would also be useful, this could prevent name clashes with future standard keys. E.g. if standard keys never use underscores, then an
or prefix for custom keys could never lead to a naming conflict.
I guess the only thing to watch out is that we might have pymolColorList
and chimeraColorList
and nglColorList
... But i think pymol::ColorList
or pymol::color_list
would be best if we were to standardize it, pymol people love their underscores. I'd feel bad taking them away from them hah
@arose @pwrose
This is sort of a more formal proposal for a comments field:
It seems that myself and other developers are eager to append application specific information into our mmtf files, so having this become part of the standard would be very helpful, and save a lot of re-writing once/if it does eventually become a part of the standard.
Does anyone have any objections to this sort of implementation?
The alternative as @speleo3 mentioned above, is to pack any extraData
directly into the base dictionary of the packed mmtf file
An example implementation for c++ is available at https://github.com/rcsb/mmtf-cpp/pull/15
This is a field to store any extra mmtf associated data. it is packed as a msgpack object, and therefore could contain anything, it is up to you (the developer) how you would like to store / pack / read data. It is sort of the equivalent of the pdb REMARK
lines.
However, we would recommend that you use the format MAP< string, msgpack object >
this allows standardized read in between applications, and is easily understandable and extensible across languages.
We do request that when using the MAP format described above, that you adhere to the following standardized key, value
pairs:
key | value description | encoding |
---|---|---|
groupColorList | list[hex code strings (len of numGroups)] | None |
atomColorList | list[hex code strings (len of numAtoms)] | None |
etc | etc | etc |
more to be decided?
Regarding the key, did you imply a convention regarding the prefix, e.g.,
structureKey (len of 1) modelKey(len of numModels) chainKey (len of numChains) groupKey (len of numGroups) atomKey (len of numAtoms) bondKey (len of numBonds)
I wasn't really meaning to, but we could if other people like that! definitely makes sense to me!
How about an explicit convention by specifying data (or properties?) for structure, model, chain, group, atom, and bond-level information that must have a matching number of records.
Data (properties) that don't fit into the categories above, would go into extraProperties.
-Peter
On Tue, Jul 17, 2018 at 12:52 PM, Daniel Farrell notifications@github.com wrote:
I wasn't really meaning to, but we could if other people like that! definitely makes sense to me!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf/issues/32#issuecomment-405706502, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEP323n3Ii-aNOlH6vDe1xYDnz3k0ks5uHkCDgaJpZM4S2avh .
A "best practice" naming convention sounds reasonable.
@pwrose do you mean that each of those "...Properties" fields would itself contain a msgpack-map with key, value pairs? Doesn't sound too bad actually. Would make it very easy to have generic parsers of it for visualizations or so (could even work in strongly-typed languages like C++). In that case though I would propose to get rid of "extraData" and have those "...Properties" as optional fields at the top-level of the MMTF hierarchy. Otherwise we introduce an extra level of complexity (also there is currently no case of optional fields outside of the top-level of the MMTF hierarchy).
@pwrose and @gtauriello - if I followed you correctly, example data could look like this:
data = {
"mmtfVersion": "1.1",
"numAtoms": 999,
"numModels": 2,
"numChains": 4,
...
"xCoordList": [1.2, 3.4, ...],
"yCoordList": [5.6, 7.8, ...],
"zCoordList": [9.0, 1.2, ...],
...
"structureProperties": {
"foo_id": "ABC",
},
"modelProperties": {
# lists have len numModels=2
"foo_rmsdList": [0.5, 0.8],
"foo_scoreList": [1.2, 3.4],
},
"chainProperties": {
# lists have len numChains=4
"foo_uniprotIdList": ["HBB_HUMAN", "HBA_HUMAN", "HBB_HUMAN", "HBA_HUMAN"],
"foo_chainColorList": [0xFF0000, 0x00FF00, 0xFF0000, 0x00FF00],
},
"groupProperties": {
# lists have len numGroups
"stride_secStructList": [7, 7, 7, ...],
"sst_secStructList": [7, 7, 7, ...],
},
"atomProperties": {
# lists have len numAtoms=999
"pymol_colorList": [1, 2, 3, ...],
"pymol_repsList": [1, 1, 1, ...],
"apbs_chargeList": [0.1, -0.4, 0.7, ...],
"apbs_radiusList": [1.2, 1.8, 1.5, ...],
},
"bondProperties": {
# lists have len numBonds
"pymol_bondTypeList": [1, 1, 1, 4, 4, 4, 4, 4, 4, 1, ...],
},
"extraProperties": {
"pymol_bondTypes": {0: "metal", 1: "single", 2: "double", 3: "triple", 4: "aromatic"}
},
}
Yes, that's a good example of what I had in mind.
On Wed, Jul 18, 2018 at 9:11 AM, Thomas Holder notifications@github.com wrote:
@pwrose https://github.com/pwrose and @gtauriello https://github.com/gtauriello - if I followed you correctly, example data could look like this:
data = { "mmtfVersion": "1.1", "numAtoms": 999, "numModels": 2, "numChains": 4, ... "xCoordList": [1.2, 3.4, ...], "yCoordList": [5.6, 7.8, ...], "zCoordList": [9.0, 1.2, ...], ... "structureProperties": { "foo_id": "ABC", }, "modelProperties": {
lists have len numModels=2
"foo_rmsdList": [0.5, 0.8], "foo_scoreList": [1.2, 3.4],
}, "chainProperties": {
lists have len numChains=4
"foo_uniprotIdList": ["HBB_HUMAN", "HBA_HUMAN", "HBB_HUMAN", "HBA_HUMAN"], "foo_chainColorList": [0xFF0000, 0x00FF00, 0xFF0000, 0x00FF00],
}, "groupProperties": {
lists have len numGroups
"stride_secStructList": [7, 7, 7, ...], "sst_secStructList": [7, 7, 7, ...],
}, "atomProperties": {
lists have len numAtoms=999
"pymol_colorList": [1, 2, 3, ...], "pymol_repsList": [1, 1, 1, ...], "apbs_chargeList": [0.1, -0.4, 0.7, ...], "apbs_radiusList": [1.2, 1.8, 1.5, ...],
}, "bondProperties": {
lists have len numBonds
"pymol_bondTypeList": [1, 1, 1, 4, 4, 4, 4, 4, 4, 1, ...],
}, "extraProperties": { "pymol_bondTypes": {0: "metal", 1: "single", 2: "double", 3: "triple", 4: "aromatic"} }, }
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf/issues/32#issuecomment-405987591, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEAhALBACwXgRpjXIjx1CYBWOWHCkks5uH14egaJpZM4S2avh .
I like it!
Re-> extraProperties
this is more for statically typed languages (like c++)
I wrote extraData
so that it didn't have to be a map<string, msgpack::object>
, rather that it could be anything, (a simple list, a number, a custom serialized object, etc)... Do you think that's useless? and that extraProperties
should just always be a map<string, msgpack::object>
?
@danpf The entries contained in the map can still be generic msgpack objects. So it doesn't really simplify parsing in statically typed languages apart from being able to get the keys (which is good I guess). Either way a bit of structure might be good and it's not a big restriction to prescribe that we expect key (string) / value (any object) pairs for extra properties.
resolved by #36
When working on modeling/prediction/design problems I know a lot of people add comments/remarks of various things to their PDB files. In the case of structures from the PDB, I think it would be best if this field is empty always.
Possible use cases:
It would be very useful to add a field dedicated to this. probably:
extras
orcomments
and it would just be a string field.The alternative is to just to use
title
orstructureId
for this kind of stuff since in most modeling they don't exist. I'm not against that either, but the spec documentation should just note which one applications should use so it's standardized. ~Dan