ideaconsult / jna-inchi

Wrapper to access InChI from Java
GNU Lesser General Public License v2.1
0 stars 0 forks source link

integrity of data in RinchiDecompositionOutput #14

Closed uli-f closed 1 year ago

uli-f commented 1 year ago

It is my understanding that the integrity of the data in RinchiOutputDecomposition strongly depends on the completeness of all arrays involved (inchis, auxInfos, roles).

If the parsing in JnaRinchi::parseNativeOutInchisText encounters an issue, these arrays may get out of sync, i.e. the number of elements might be different and it is not clear anymore which auxinfo and ReactionComponentRole is associated with which inchi.

@ntk73 Do you think it is a good idea to insert empty strings and a ReactionComponentRole.UNKNOWN if there is an issue with the parsing of the auxinfo and reaction component role for a particular inchi? This way we may ensure that the three data points inchi, auxinfo, role are synchronized in their three respective arrays. The main issue I see is the introduction of a ReactionComponentRole.UNKNOWN which probably needs special attention when switching over the enum somewhere in the code.

Alternatively, we could introduce a check at the end of JnaRinchi::parseNativeOutInchisText whether all three arrays have the same length and (1) throw an exception or (2) log an error message and return a RinchiDecompositionOutput with no content and an error status to guarantee that the data in RinchiDecompositionOutput is always consistent.

It might be worthwhile to change the access modifier of JnaRinchi::parseNativeOutInchisText from private to protected and add some test cases with incorrect rinchi data to make sure that this actually leads to the desired result.

I hope I described that in a way that it is understandable; please let me know if this needs better/more explanations from me.

ntk73 commented 1 year ago

I think that in case of missing auxInfo for a particular reaction components, empty strings are returned from the native code, anyway. If incorrect info is placed within RAuxInfo, then native code returns an error message as well as error is returned when more elements/components in RAuxInfo are present than the one in RInChI. So, for me it seems that no additional extra checks are needed. But this should be tested more thoroughly.

uli-f commented 1 year ago

I am not quite sure what the best way to test this is. Additionally, it might change how the rinchi native library handles this and putting a test in place on our side was quick and easy.

So I just put together PR #19 that adds a check when instantiating RinchiDecompositionOutput objects. The exception thrown by the constructor is caught in JnaRinchi::parseNativeOutInchisText and converted to an error message; I also put some test cases in place.

@ntk73 Let me know what you think about the PR. If you can easily come up with test cases to feed unbalanced elements/components in RAuxInfo and RInChI into the rinchi native library, I'd appreciate if you would put them in. Otherwise, by doing the check on our side we should be safe anyway.

ntk73 commented 1 year ago

I have accepted the PR #19. It is OK to have some extra checks which, most probably, in the normal situations will be not needed but it is not sure for the "strange cases", so it is OK.

uli-f commented 1 year ago

Thanks for merging PR #19 .

I just opened PR #20 that adds a toString method to RinchiDecompositionOutput. Would you mind merging that one as well? Thank you.