greglandrum / molecular_interchange

0 stars 1 forks source link

Other "Chemical JSON" formats #1

Open ghutchis opened 6 years ago

ghutchis commented 6 years ago

I can see that you had a long thread in rdkit about this - I was about to point out other chemical JSON formats, but that's already in there.

I know that there's an effort on the quantum chemistry side to work on this.

But I'd also be happy to hammer out an informal effort between RDKit and Open Babel? I think OpenEye had a proposal recently, but I can't find it now.

greglandrum commented 6 years ago

Yeah, there was a lot of input during that thread last year and I’ve had several other conversations about it over the last year. The effort from OpenEye was interesting (I was in some of those conversations) but, as you say, that kind of vanished.

At this point I’d really like to get something small and focused done and try it out. Doing something informal with OB would be very cool. My current plan is to put some time into this over the holidays and see what comes out.

ghutchis commented 6 years ago

Great. Bob Hansen of Jmol is interested in this as well. My feeling is to:

I've reserved time over the holidays for coding, so let me know. Perhaps we can arrange some times for chat on Slack or something.

Do you have any of the OpenEye examples? I can't find them at all - I guess I didn't save them to my laptop.

greglandrum commented 6 years ago

I don’t have any of those examples. I don’t remember what license they were under (or it wasn’t clear at the time) and I ended up deleting whatever I had.

Based on what I remember though, I’m thinking of a fairly different format. I’ll try to knock out a straw man example in the next few days.

greglandrum commented 6 years ago

@ghutchis : I just pushed some changes to the doc (to include a couple of examples) as well as some RDKit-based python code to read/write the format. There's a test file there that I've used to do a bit of simple round-trip regression testing.

The next step for me is to add support for residue information (for things read from PDB files), but I think what's there is already worth looking at if you have the time.

ghutchis commented 6 years ago

I like what I see so far - I'll try running the scripts tonight: [x] storing both 2D and 3D coordinates [x] referencing the partial charge method [x] versioning [x] reflecting toolkit-specific data

I was at a workshop to spec out a quantum chemistry JSON format and I suggested the same concept of toolkit-specific tags.

One suggestion that came up from Robert Hanson (of Jmol) at that workshop was to include a "magic header" for filetype detection, like "chemjson-header" or "header":{"chemjson-version":10

I haven't looked closely, but if there are multiple molecules, I guess they're part of an array?

greglandrum commented 6 years ago

I like the idea of the specific header, for checking filetype. I'll change header to moljson-header.

What's currently there doesn't work particularly well for multiple molecules; that wasn't a use case I had considered here. But there's no reason not to. You could have a list of the full data structures, but it doesn't make sense to duplicate the header and the defaults. I'll switch it so that there at the top level you have moljson-header, the *Defaults, and a molecules list containing the other structures.

greglandrum commented 6 years ago

Ok, multi-molecule support is now there. Thanks for that suggestion