rcsb / mmtf-cpp

The pure C++ implementation of the MMTF API, decoder and encoder.
MIT License
21 stars 24 forks source link

access to custom fields #13

Closed speleo3 closed 5 years ago

speleo3 commented 6 years ago

This is a feature request to complement pull request #12. The API should make custom fields accessible.

Example how this could look like (better API proposals welcome):

mmtf::StructureData data;
std::vector<int32_t> colorList;  // custom field

mmtf::MapDecoder md = mmtf::mapDecoderFromFile("test.mmtf");  // proposed API

md.decode(data);  // proposed API
md.decode("pymolColorList", false, colorList);
gtauriello commented 6 years ago

I was thinking of ways to combine this with the existing decodeFromFile, decodeFromStream and decodeFromBuffer functionality to accommodate extra fields. Basically I would add a boolean flag to those functions to avoid warnings printed to cerr for non-parsed keys and return the MapDecoder used. Basically the traditional file reading

mmtf::StructureData data;
mmtf::decodeFromFile(data, filename);

would still work, but could be changed into the following to accommodate your use case:

mmtf::StructureData data;
std::vector<int32_t> colorList;  // custom field
mmtf::MapDecoder md = mmtf::decodeFromFile(data, filename, true);
md.decode("pymolColorList", false, colorList);
md.checkExtraKeys();  // optional check at the end

Like this we can naturally have this functionality for decodeFromStream and decodeFromBuffer too. What do you think?

Somewhat related: I was also thinking of adding a virtual destructor to StructureData to enable extensions by inheritance if you want colorList as part of an MyStructureData class. Would that be useful?

speleo3 commented 6 years ago

@gtauriello - boolean flag to suppress the extra keys warnings looks good.

Customization by inheritance would be awesome!

In my simplemmtf Python library I abstracted access to the different fields with a levels dictionary. I guess such a simple approach is not possible with C++, but allowing inheritance of StructureData, and GroupType could go into that direction.

gtauriello commented 6 years ago

I hadn't thought of extensions for GroupType or anything beyond the top-level map, but I guess I could figure out some way to pass this information around (need to look into msgpack's convert function or add some static flag in MapDecoder to suppress the extra key warnings there too).

Also the current layout of StructureData makes it kind of hard to use an inherited variant of GroupType. Not sure if there is a C++ trick to do that...

speleo3 commented 6 years ago

StructureData<GroupType> template?

struct MyGroupType : GroupType {
    std::vector<int> strideSecStructList;
    MSGPACK_DEFINE_MAP(...);   // <-- not sure if that's possible with inheritance
};

struct MyStructureData : StructureData<MyGroupType> {
    std::vector<int> pymolColorList;
    std::vector<int> pymolRepsList;
    MSGPACK_DEFINE_MAP(...);   // <-- not sure if that's possible with inheritance
};
gtauriello commented 6 years ago

Hmm yes that could work. We don't use MSGPACK_DEFINE_MAP for StructureData and GroupType anyways, so that shouldn't cause issues (decoding is in msgpack_decoder.hpp and encoding in encoder.hpp and object_encoders.hpp). May just need some reasonable abstraction layer to be able to reuse the existing functions (instead of having to copy/paste them for extended objects).

gtauriello commented 5 years ago

@speleo3 Now that extra data fields are getting into the MMTF specs (i.e. https://github.com/rcsb/mmtf/blob/master/spec.md#extra-data), shall we close this in favor of the standardized way to add extra data?

I think that #15 will be able to cover your use case nicely. Your example could be treated as follows:

// read data with existing functionality
mmtf::StructureData data;
mmtf::decodeFromFile(data, "test.mmtf");

// get custom field (here: per atom color) from atomProperties
std::vector<int32_t> colorList;  // target variable
mmtf::MapDecoder md(data.atomProperties);
md.decode("pymolColorList", true, colorList);
speleo3 commented 5 years ago

I might still want this in the future ;-) Would for example also allow to quickly query a single field from an MMTF file without decoding everything:

mmtf::MapDecoder md = mmtf::mapDecoderFromFile("test.mmtf");  // proposed API
float resolution = 0.0;
md.decode("resolution", false, resolution);

But with the new spec, this is certainly low priority.

gtauriello commented 5 years ago

Resolved in #31. Code to read a file and just read one field looks as follows:

mmtf::MapDecoder md;
mmtf::mapDecoderFromFile(md, "test.mmtf");
float resolution = 0.0;
md.decode("resolution", false, resolution);

Timings reading the largest MMTF file in our test-suite (1HTQ, 9.5M): 0.004s to read just resolution with code above vs 0.054s to read full file. Roughly 70% of the 0.004s is pure I/O, the rest is msgpack-unpacking to get access to the data.