openbabel / openbabel

Open Babel is a chemical toolbox designed to speak the many languages of chemical data.
http://openbabel.org/
GNU General Public License v2.0
1.08k stars 414 forks source link

OBConversion.ReadFile returns True at end of CDX #707

Closed openbabel-bot closed 7 years ago

openbabel-bot commented 16 years ago

Openbabel 2.1.1 compiled on Mac OS X 10.5.1/intel

== CONTEXT pybel implements a ReadFile generator in order to read every molecule from a single file. It relies on OBConversion.Readfile which is called repeatedly until it returns 'false' when no more molecule could be read from the file.

==PROBLEM OBConversion.ReadFile in 2.1.1 does not appear to increment an internal counter when it reads a molecule from pybel or from any other Python script. It always reads the first molecule from the file and returns 'true'. As a consequence, pybel.ReadFile loops forever.

Excerpt from pybel.py

obmol = ob.OBMol() notatend = obconversion.ReadFile(obmol,filename) while notatend: #<<<<<<<< ALWAYS TRUE yield Molecule(obmol) obmol = ob.OBMol() notatend = obconversion.Read(obmol)

==EXPECTED RESULT OBConversion.ReadFile should sequentially read every molecule from a file and return 'false' when no more molecule is available.

Reported by: steffx

openbabel-bot commented 16 years ago

Logged In: YES user_id=850620 Originator: NO

Excellent bug report. Just to correct something, only the first call is to ReadFile; subsequent calls are to Read.

As you point out, the problem is in the C++ code, which has been very difficult to get correct cross platform (i.e. it works on Windows (although this does not exactly correspond to 2.1.1), and worked (last time I checked) on Linux). If you need a workaround in the short time, let me know; there's one or two things you could try...

Noel

Original comment by: @baoilleach

openbabel-bot commented 16 years ago

Logged In: YES user_id=1965159 Originator: YES

Thanks for this info. I could manage to read all the molecules from a single file using ReadFile for the first call then Read, as you wrote.

There is still a little problem: Read doesn't return 'false' after it has reached the end of the file. As a consequence, I have to check against the number of atoms in the returned molecule and break the loop whenever it falls to zero (see below). However I'm not sure this is perfectly correct.

Quick'n'dirty

conv = openbabel.OBConversion() conv.SetInAndOutFormats("cdx", "smiles") mol = openbabel.OBMol() if conv.ReadFile(mol, "test2.cdx"): mol2 = openbabel.OBMol() while conv.Read(mol2): if mol2.NumAtoms() == 0: break mol2 = openbabel.OBMol()

Cheers Stéphane

Original comment by: steffx

openbabel-bot commented 16 years ago

Logged In: YES user_id=850620 Originator: NO

The only remaining difference with the Pybel code is your use of SetInAndOutFormats rather than just SetInFormat. However, this is unlikely to be the problem.

Could you confirm whether the problem is specific to the cdx format; that is, does it work for a multimolecule SDF file (see head.sdf in the distribution in the scripts/python/example folder)?

It might be useful if you could attach a specific multimolecule CDX file to this bug report or point me to one on the web, so that I can use it as a test case.

Noel

Original comment by: @baoilleach

openbabel-bot commented 16 years ago

Logged In: YES user_id=1965159 Originator: YES

Yes it is specific to CDX. pybel.readfile works fine with a multimolecule SDF file. File Added: pybeltest_cdx.py

Original comment by: steffx

openbabel-bot commented 16 years ago

pybeltest_cdx.py

Original comment by: steffx

openbabel-bot commented 16 years ago

Archive.zip

Original comment by: steffx

openbabel-bot commented 16 years ago

Logged In: YES user_id=1965159 Originator: YES

oops... sorry, I thought I could attach several files.

Archive.zip contains the two python scripts (one for cdx, the other for sdf formats) and the two files I used (test.cdx and test.sdf).

pybeltest_sdf.py works as expected whereas pybeltest_cdx.py blocks. File Added: Archive.zip

Original comment by: steffx

openbabel-bot commented 16 years ago

Logged In: YES user_id=21420 Originator: NO

This should now be fixed in the SVN code -- the chemdrawcdx code now returns false if it's called and we're at the end of the file.

-Geoff

Original comment by: @ghutchis

openbabel-bot commented 16 years ago

Logged In: YES user_id=1312539 Originator: NO

This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker).

Original comment by: sf-robot