saydx / specification

Specifications for scientific array data exchange
Other
2 stars 2 forks source link

Coordination with Existing Standards #3

Open awvwgk opened 4 years ago

awvwgk commented 4 years ago

How should this project fit into existing efforts of standardizing computational sciences?

For example there seems to be an overlap with QCSchema.

dgasmith commented 4 years ago

There does seem to be a lot of overlap! We also have a Python-based implementation of the Schema here that is fairly popular if you wish to try it out. Another example here.

For your message layer I would encourage something like MessagePack which seems to cover your needs and is quite widely supported.

aradi commented 4 years ago

Very valid points! Also, thanks for drawing my attention to QCSchema and MessagePack. Indeed, if we wanted to create a new protocol instead of using an existing one, we would need to have a good justification.

I have created now a PR (#4) which adds a chapter with comparison between SAYDX and some existing protocols. I have not much experience with some of those protocols, so please feel free to comment on it, if any of the raised points is not valid in your opinion or something is missing.

bhourahine commented 4 years ago

Regarding QCSchema, its not apparent to me if i-PI and similar modes of operation are also possible, i.e. initialization and then bidirectional command/data exchange. The documentation seems to suggest set-up of codes and post-exchange of results (output energies, structures, orbitals, densities, etc.). So restarts are possible based on output, but perhaps not active intervention in calculations, or is this available and documented somewhere?

dgasmith commented 4 years ago

One more to look at: https://developers.google.com/protocol-buffers

Your comparison looks good overall. The one item I would do is benchmark how much latency you can tolerate in real world settings and think about the sacrifices to generality that you want to make. This is always a tradeoff, looking through QM/MM Frameworks you can see every solution in the problem space from from general/slow(er) which interface to many codes to the so highly optimized they only built a layer used by one group.

@bhourahine What groups like I-PI and MolSSI MDI are doing with the schema is sending pieces of the schema at a time. So if your symbols/geometry field means the same thing as QCSchema it can lower cognitive overhead. See here for a code example.

This goes back to our philosophy of separating key/value structure from the message format. If you send each field separately, send it as JSON, or HDF5 following the Schema key/value it still helps lower the cognitive and interfacing overhead.

awvwgk commented 4 years ago

I found the Rust library serde a very useful tool, also conceptually. Similar to a schema it allows separating the key/value structure from the actual serialization/deserialization format. Having a similar serde framework for our field would save us from writing parsers or similar ever again.

aradi commented 4 years ago

@dgasmith Thanks, protocol-buffers is something I had already looked up once, but I find it too complex for our purposes, especially it is not clear, how it can be adopted to Fortran. :wink:

As for the key-value pair concept. Saydx as messaging format could easily support this. Actually, what I really look for, is a thing between HDF5 and JSON. If JSON could be extended to support arrays as values, it would be roughly OK for the text representation. (Only roughly OK, since it is still unnecessary complex to parse). And one could use MessagePack via type extension for the binary storage. (We just would need some Fortran bindings...). But actually whichever solution we come up at the end, communicating QCSchema would be as easy as it is now with JSON, as it would be possible to map it to the same key-value representation.

@awvwgk serde looks very interesting. However, I fail to see at the moment, how we could implement something similar in C and in Fortran (even in modern Fortran). Whatever we come up with at the end, I wish to use it also in DFTB+, otherwise we would have to invent yet another protocol :laughing:

aradi commented 4 years ago

As for the higher level protocol: @dgasmith I had a look at your example. We wish to incorporate something like the MDI-commands, but also within the same framework: Client sends a request to the server (as key-value pair dictionary in whichever format) and server answers with an other key-value dictionary. So the controlling commands use exactly the same message format as the data.

awvwgk commented 4 years ago

@aradi Obviously by using serde directly and bind via C to Fortran.

Just kidding. A good preprocessor might be able to emulate this behaviour and a great deal of serde is actually done by the very powerful Rust preprocessor. :wink:

dgasmith commented 4 years ago

@aradi For the MDI-like-commands I don't quite follow your comment. We send the same top-level keys effectively to and from the server, just different blocks of it at different times with something like MDI.

As a note we use msgpack extensions for our custom data structures and it seems to work quite well. Though we purposefully keep this pretty minimal.

dgasmith commented 4 years ago

@aradi BTW, we would love to get DFTB+ into QCEngine.

awvwgk commented 4 years ago

BTW, we would love to get DFTB+ into QCEngine.

That's the way they get you :).

aradi commented 4 years ago

:laughing: @dgasmith Sure, we are definitely interested in the possibility to provide an interface to QCEngine. Do you have any other integrations (maybe even some Fortran (!) codes), which one coud take as an example?

dgasmith commented 4 years ago

@aradi We do not integrate too deeply with most codes at the moment and mostly form and parse output files. Codes which have Python interface are super easy to integrate (example) otherwise here is a fairly simple MOPAC harness.

If you take a look at the folder there are about a dozen examples. Happy to help if you wish to take a crack at it.

awvwgk commented 4 years ago

@dgasmith @aradi We are getting off-topic ;). I opened an issue regarding DFTB+ integration over at QCEngine: https://github.com/MolSSI/QCEngine/issues/223

aradi commented 4 years ago

@awvwgk You are absolutely right, thanks for opening an issue at the QCEngine page for that!