openbabel / openbabel

Open Babel is a chemical toolbox designed to speak the many languages of chemical data.
http://openbabel.org/
GNU General Public License v2.0
1.05k stars 409 forks source link

System crash on converting '.cdx' files using Python interface #1690

Open theman0605 opened 6 years ago

theman0605 commented 6 years ago

Environment Information

Open Babel version: 2.4.1 Operating system and version: Windows 7

Expected Behavior

Expected to be able to use PyBel from Jupyter notebook to convert ChemDraw .cdx file to .sdf or .smi file.

Actual Behavior

System crashes and kernel stops on execution of command. See below for script.

Steps to Reproduce

import pybel as pb allmols = [mol for mol in pb.readfile("cdx", "genenicStructures.cdx")

Cannot attach file due to unsupported type; see https://github.com/theman0605/cdxFileConversion for a couple of sample files for conversion.

theman0605 commented 6 years ago

Error Messages: commandlineerrorimage errorimage

Conversion of file using the OpenBabel GUI: genericstructuresscreenshot

baoilleach commented 6 years ago

Confirmed. Reading CDX files doesn't work from the Python interface.

ghutchis commented 6 years ago

Yes, there seems to be a weird difference in reading binary CDX files inside another process (e.g., Python, but also Avogadro) than from the command-line. Locales maybe? Not sure.

CamAnNguyen commented 6 years ago

It happens to me on Ruby interface as well.

serval2412 commented 6 years ago

On pc Debian x86-64 with master sources updated today, I could reproduce the pb. Here's the bt : bt.txt

serval2412 commented 6 years ago

Here's the content of the buffer from in src/formats/chemdrawcdx.cpp: #6 0x00007fffe2a63a41 in OpenBabel::CDXReader::CDXReader (this=0x7fffffffdf50, is=...) at /home/julien/projects/openbabel/src/formats/chemdrawcdx.cpp:727 727 throw; (gdb) p buffer $1 = "\000h!\311\342\377\177\000"

serval2412 commented 6 years ago

Interestingly the function is called twice and first time everything is ok. See attached gdb session with 2 bts and 2 different values of buffer bt.txt

serval2412 commented 6 years ago

I noticed that by adding a tellg() on the input stream at the beginning of CDXReader ctr, first time I get 0, the second time I get -1 These values aren't changed even by adding a seekg(0) before calling tellg() I wonder if the istream is still valid the second time.

serval2412 commented 6 years ago

Keeping on the investigation, I displayed good() on the stream and as expected the second one returns 0 (so not good). Then I displayed: eof => 0 in both cases so ok bad => 0 in both cases so ok fail => 1 in the second time This patch prevents the pb but it seems we enter an endless loop (I stopped after 6000 iterations): `diff --git a/src/formats/chemdrawcdx.cpp b/src/formats/chemdrawcdx.cpp index f6538cc1..e7eb591a 100644 --- a/src/formats/chemdrawcdx.cpp +++ b/src/formats/chemdrawcdx.cpp @@ -715,6 +715,8 @@ stringstream& CDXReader::data() CDXReader::CDXReader(std::istream& is) : ifs(is), depth(0) { //ReadHeader

schluta commented 3 years ago

can confirm this is still an issue on MacOS. I can process cdx files from the command line but not from pybel.

james-jack commented 2 years ago

Also seems to happen with OBDotNet. cdx => mol gives an empty molfile.