rcsb / mmtf-cpp

The pure C++ implementation of the MMTF API, decoder and encoder.
MIT License
21 stars 24 forks source link

Initial C++ implementation #1

Closed gtauriello closed 7 years ago

gtauriello commented 7 years ago

This is an initial implementation for a C++ header only decoder for MMTF. The signature for the encoder was added as a preview. Please let us know how you like the usage and layout of the code and we will continue with the encoder (probably not before April though).

The README includes all information on how to use it and there are example and test codes. To test the decoder, I wrote a simple json-dumper for the previous C decoder and kept the results. The C++ decoder can generate json files which are formatted identically and I use a simple diff to ensure that we do the same job. The test can run automatically as described in the README. For the future we should aim for a cleaner unit testing approach though (using Boost or CppUnit?). Limitations of the current testing:

Another point we might consider is to include msgpack-c headers in the repository too. I have seen msgpack-c including boost headers somehow automatically via github. Seems cool to avoid duplication of files but I would have to dig into how they do that...

speleo3 commented 7 years ago

I vote against checking in the generated doxygen documentation. I've discussed with Yana that I prefer to also remove it from the mmtf-c repo.

gtauriello commented 7 years ago

Ok. Yeah I agree with you on that. I had just pushed it in there to be consistent with the C version. I removed it from the repository and added some text to the README on how to generate it from scratch.

speleo3 commented 7 years ago

the mmtf-c repo is not particularly clean, no need to be consistent on that level :)

josemduarte commented 7 years ago

Regarding ncsOperatorList, we should definitely have that in mmtf-c and here. I'll open a new issue for mmtf-c. A good example of file containing data in that field would be 1auy (http://mmtf.rcsb.org/v1.0/full/1AUY).

As a note, we started some time ago a test suite at https://github.com/rcsb/mmtf/tree/master/test-suite where we want to compile a list of structures to test. We don't have the structure data in json format there yet. But at least we do have the list of test structures in https://github.com/rcsb/mmtf/blob/master/test-suite/file-list.json. I've just submitted a PR to add 1auy there.

gtauriello commented 7 years ago

Thanks for the example. I managed to read it and compared it with the mmCIF file. Not sure I understand the field completely, but it seems to have parsed it correctly. If any of you wants to double-check, I get the following 14 matrices (partial output from "./traverse.exe 1AUY.mmtf txt"):

ncsOperatorList[0]: [0.5, -0.809017, -0.309017, 128.875, 0.809017, 0.309017, 0.5, -208.524, -0.309017, -0.5, 0.809017, 79.6491, 0, 0, 0, 1]
ncsOperatorList[1]: [-0.309017, -0.5, -0.809017, 337.399, 0.5, -0.809017, 0.309017, -128.875, -0.809017, -0.309017, 0.5, 208.524, 0, 0, 0, 1]
ncsOperatorList[2]: [-0.309017, 0.5, -0.809017, 337.399, -0.5, -0.809017, -0.309017, 128.875, -0.809017, 0.309017, 0.5, 208.524, 0, 0, 0, 1]
ncsOperatorList[3]: [0.5, 0.809017, -0.309017, 128.875, -0.809017, 0.309017, -0.5, 208.524, -0.309017, 0.5, 0.809017, 79.6491, 0, 0, 0, 1]
ncsOperatorList[4]: [-0.809017, -0.309017, -0.5, 466.274, -0.309017, -0.5, 0.809017, 79.6491, -0.5, 0.809017, 0.309017, 128.875, 0, 0, 0, 1]
ncsOperatorList[5]: [-0.5, 0.809017, -0.309017, 386.625, -0.809017, -0.309017, 0.5, 208.524, 0.309017, 0.5, 0.809017, -79.6491, 0, 0, 0, 1]
ncsOperatorList[6]: [0.5, 0.809017, 0.309017, 128.875, -0.809017, 0.309017, 0.5, 208.524, 0.309017, -0.5, 0.809017, -79.6491, 0, 0, 0, 1]
ncsOperatorList[7]: [0.809017, -0.309017, 0.5, 49.2259, -0.309017, 0.5, 0.809017, 79.6491, -0.5, -0.809017, 0.309017, 128.875, 0, 0, 0, 1]
ncsOperatorList[8]: [0, -1, 0, 257.75, 0, 0, 1, 0, -1, 0, 0, 257.75, 0, 0, 0, 1]
ncsOperatorList[9]: [0.809017, -0.309017, -0.5, 49.2259, 0.309017, -0.5, 0.809017, -79.6491, -0.5, -0.809017, -0.309017, 128.875, 0, 0, 0, 1]
ncsOperatorList[10]: [0.309017, -0.5, -0.809017, 178.101, -0.5, -0.809017, 0.309017, 128.875, -0.809017, 0.309017, -0.5, 208.524, 0, 0, 0, 1]
ncsOperatorList[11]: [0, 0, -1, 257.75, -1, 0, 0, 257.75, 0, 1, 0, 0, 0, 0, 0, 1]
ncsOperatorList[12]: [0.309017, 0.5, -0.809017, 178.101, -0.5, 0.809017, 0.309017, 128.875, 0.809017, 0.309017, 0.5, -208.524, 0, 0, 0, 1]
ncsOperatorList[13]: [0.809017, 0.309017, -0.5, 49.2259, 0.309017, 0.5, 0.809017, -79.6491, 0.5, -0.809017, 0.309017, -128.875, 0, 0, 0, 1]
josemduarte commented 7 years ago

@gtauriello awesome thanks! For context, the field is used in some PDB entries that need extra operators to generate the full asymmetric unit. This is usually the case for highly symmetrical viral capsid proteins. In those cases you need the extra information in this field to be able to reconstruct the full unit cell. See more info under "Example of a viral capsid" section in here

speleo3 commented 7 years ago

This looks really great, excellent job!

One thought regarding the test files: Currently there is 68KB of code, and 45MB of example MMTF files. Are those really necessary here? I really like that there is a separate repo for test files (rcsb/mmtf/test-suite), if those files are really needed here for unit tests, maybe it could be added as a git submodule?

josemduarte commented 7 years ago

I'd support the idea of @speleo3 to move the data files to the test suite in the mmtf repo.

gtauriello commented 7 years ago

I agree on both. For unit tests I already mentioned above that we should aim for a proper unit testing approach (would it be ok to use Boost there?). The mmtf files are exactly the same as in the mmtf/test-suite I believe. Only difference is my custom json-style output (out_json_ref.tar.gz) that I compare with. But the latter is also just an artifact of not having decent unit tests yet. But I guess we can do both those changes at a later point in time. Also the testing will be easier with an encoder in place...

pwrose commented 7 years ago

Outstanding job!

Can you please remove the copyright line from the files: // Copyright [2017] [RCSB]

gtauriello commented 7 years ago

Ok. I removed those lines. That was one more thing I blindly copied from the C library. Also the Apache License is had copied from there. If that one is not suitable here, please let me know.

speleo3 commented 7 years ago

Regarding the Apache license: Did the RCSB have specific reasons to pick the Apache license over e.g. the BSD-3-clause or MIT license? I understand that for practical reasoning, they are basically the same. But the Apache license seems to be a bit higher maintenance, given the requirement to explicitly list all changes in derived work, and the much longer and "heavily lawyered" license text.

pwrose commented 7 years ago

I'm open change it to an MIT license. We already use the MIT license for mmtf-javascript.

If we use the MIT license, then we do need use the following copyright statement as part of the license text:

Copyright 2017, University of California San Diego

Is there any reasons why we shouldn't use the MIT license instead?

-Peter

On Sun, Mar 5, 2017 at 8:29 AM, Thomas Holder notifications@github.com wrote:

Regarding the Apache license: Did the RCSB have specific reasons to pick the Apache license over e.g. the BSD-3-clause or MIT license? I understand that for practical reasoning, they are basically the same. But the Apache license seems to be a bit higher maintenance, given the requirement to explicitly list all changes in derived work, and the much longer and "heavily lawyered https://www.whitesourcesoftware.com/whitesource-blog/top-10-apache-license-questions-answered/" license text.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf-cpp/pull/1#issuecomment-284241212, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEJtd3kyOi61DVp9C1eVQwZJ12F3tks5riuLUgaJpZM4MSgtX .

gtauriello commented 7 years ago

Ok I switched to the MIT license. Let me know if you need anything else from me at the moment. Otherwise, I expect to be working on encoder and more testing in April.

Also for a separate PR: if anyone of you has experience with git submodules, it would be cool to add the MMTF test suite as @speleo3 suggested instead of having a separate copy in here. One could probably even do the same with the msgpack-c headers to avoid having external dependencies...

josemduarte commented 7 years ago

Regarding test data files, I think the best solution is to have them all in the test suite. I'd propose to add the json files that @gtauriello produced to the rcsb/mmtf/test-suite and also keep there the test mmtf files. I'll submit a PR for that in the mmtf project.

gtauriello commented 7 years ago

Ok then I should probably add new json files there for the the new test files? I would have to check though whether we handle the edge cases correctly. Also I opened an issue there since I am unsure on whether sequenceIndexList is a required field or not...

josemduarte commented 7 years ago

The rcsb/mmtf/test-suite is updated with json files and latest mmtf files. Please feel free to add any missing file.

josemduarte commented 7 years ago

Anything against merging? I think re-pointing tests to the files in test-suite can be done later. I'll merge this tomorrow if there's no comments.

pwrose commented 7 years ago

Sounds good to me. -Peter

On Wed, Mar 15, 2017 at 4:26 PM, Jose Manuel Duarte < notifications@github.com> wrote:

Anything against merging? I think re-pointing tests to the files in test-suite can be done later. I'll merge this tomorrow if there's no comments.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf-cpp/pull/1#issuecomment-286911390, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwECuAY8nBzImzKiC9NhaKKZVYvN0Bks5rmHPCgaJpZM4MSgtX .

gtauriello commented 7 years ago

Yes I agree that we can merge this and updating the test files can be done later. I just quickly pushed a fix that ensures that the traverse example works fine for the new test files with optional fields (the parser was fine anyways). I didn't get to update the test cases yet (I will also need to update the json files to include the new test files with optional fields), but I hope to find time to do it in a separate pull request soon...