ideaconsult / cdk

The Chemistry Development Kit
https://cdk.github.io/
GNU Lesser General Public License v2.1
0 stars 0 forks source link

RInChIGenerator::generateRinchiFromReaction issue with agents in CDK's MDLRXNWriter #17

Closed uli-f closed 1 year ago

uli-f commented 1 year ago

The in-line comment in the code in RInChIGenerator::generateRinchiFromReaction states that

//If agents are present, they are not written in the RXN file, but 
//their number is added into the count line.

However, looking at MDLRXNWriterTest::writeAgentsFromV2000 and MDLRXNWriterTest::writeAgentsFromV3000 it very much looks to me that agent counts and the agents themselves are actually written by MDLRXNWriter.

Is there another issue with this that I am missing? If so, I am happy to make an effort to fix this in MDLRXNWriter instead of working around the agents.

ntk73 commented 1 year ago

The problem here is following: The standard MDL RXN syntax includes only reagents and products, no agents (at least this is what native code currently expects). CDK adds number of agents in the count line which is a kind of extension to the RXN official format (I am not sure whether term official is OK, anyway). Unfortunately, this causes an error in the RInChI native library. Actually I saw that native code stops at the count line. I did not get deep into the CDK code to see whether the agents are actually written. If you have checked this, then this is so.

Bottom line: If the RXN file text, written by CDK, contains agents, it breaks the generation of RInChI within native library. That is way RDFile format is preferred when possible.

uli-f commented 1 year ago

Thanks for clarifying.

Yes, to the best of my knowledge adding the number of agents (and the agents themselves) in the counts line of an RxnFile is an unofficial extension originally introduced by ChemAxon. However, it has been picked up by many/most programs because of its usefulness.

Just to make sure that I understand correctly what happens when calling RInChIGenerator::generateRinchiFromReaction:

Do the two bullet points above describe the flow of objects correctly?

uli-f commented 1 year ago

Not being able to handle agents is a limitation I'd like to make user's of the functionality aware of.

Would you mind adding a sentence that clearly states that agents are not being included in the generated RInChI when using CDK's MDL RXN Writer by setting the flag useCDK_MDL_IO to true to the following javadoc locations:

uli-f commented 1 year ago

I also started an overview documentation org.openscience.cdk.rinchi.package.html in PR #20 .

I'd appreciate it if you could add this limitation about agents there as well. I left a TODO there for you :)

ntk73 commented 1 year ago

Just to make sure that I understand correctly what happens when calling RInChIGenerator::generateRinchiFromReaction:

* **Setting the flag `useCDK_MDL_IO` to `true`**: This uses CDK's capabilities to convert the CDK `IReaction` object to a RXN V2000. Then, this RXN V2000 is used as an input for the native RInChI library and the native RInChI library cannot read this RXN V2000 and throws an error on the count line if there is an agent present.

* **Setting the flag `useCDK_MDL_IO` to `false` (default)**: CDK's `IReaction` object is converted to a `RinchiInput` object, this is then written to an RDFile in jna-rinchi and used as an input to the native RInChI library. The native RInChI library can deal with agents when reading the RDFile.

Do the two bullet points above describe the flow of objects correctly?

Yes, you bullet points are correct.

uli-f commented 1 year ago

I added some documentation for this to the package.html, PR #23.

Please have a read through and feel free to modify.

I'd appreciate a brief acknowledgement if you are fine with it and then we can close this issue.

ntk73 commented 1 year ago

@uli-f , I have merged PR #23. You have written very good and extended documentation texts. I kind of blended/combined mine and your texts.