rvianello / chemicalite

An SQLite extension for chemoinformatics applications.
https://chemicalite.readthedocs.org/en/latest/
BSD 3-Clause "New" or "Revised" License
55 stars 8 forks source link

Crash #3

Closed rmcgibbo closed 3 years ago

rmcgibbo commented 3 years ago

The following test crashes for me. My environment is Linux x64 (CentOS7), Chemicalite from master, python37, and apsw 3.30.0.

import apsw

SCHEMA = """CREATE TABLE mytable (                                                                                                                                                                                                            
  identifier TEXT PRIMARY KEY,                                                                                                                                                                                                                
  molecule MOL                                                                                                                                                                                                                                
);"""

con = apsw.Connection(":memory:")
con.enableloadextension(True)
con.loadextension(path_to_extension_dot_so)
con.enableloadextension(False)

cur = con.cursor()
cur.execute("PRAGMA page_size=2048")
cur.execute(SCHEMA)
cur.execute("SELECT create_molecule_rdtree('mytable', 'molecule')")

cur.execute("INSERT INTO mytable (identifier, molecule) VALUES(?, mol(''))", ("ident",))
cur.execute("""UPDATE mytable SET molecule = mol(?) WHERE identifier = ?;""", ("CCC1CCCC[NH2+]1.[Cl-]", "ident"))

The stdout stream is

python: chemicalite/src/rdtree.c:638: nodeGetBfp: Assertion `iItem < readInt16(&(pNode)->zData[2])' failed.
Aborted
rmcgibbo commented 3 years ago

I also -- and I realize this is probably expected -- got various "SQL Logic errors" when adding molecules that contained strange chemistry, in particular organometallics. It would be helpful if there were some way to have the error message (not sure if this is possible, but even something on stderr would help) indicate what was wrong, because it wasn't obvious to me at first what was going on.

rvianello commented 3 years ago

Hi @rmcgibbo thanks a lot for reporting this issue.

I will explore a bit of the options for enabling the code to automatically log these errors, and I'm glad the sql errors provided some hints of what was going wrong, but a failing assertion still looks like a bug to me (i.e. a pre or post condition is not met). I will try and reproduce this behavior.

rvianello commented 3 years ago

Hi @rmcgibbo. The crash was due to the incorrect ordering of some operations performed on the fingerprints tree when a record is deleted or updated. Hopefully, that one should be fixed now.

Also, I took the occasion to modify the behavior of most functions, in order to return a SQL NULL when conversion to molecule fails, and to return NULL on NULL input. This should reduce the occurrence of blocking errors and improve the overall usability of the implemented API.

sqlite> .load chemicalite
sqlite> SELECT chemicalite_version(), rdkit_version();
2020.12.2|2020.09.3
sqlite> SELECT mol_mw('X');

sqlite> 

At the same time, anomalies occurring at runtime should be now logged using SQLite's Error and Warning Log:

sqlite> .log stdout
sqlite> SELECT mol_mw('X');
(28) Could not convert 'X' to mol

sqlite>

Alternatively (mainly because the API for configuring a log handler is not available from the sqlite3 python library package, and maybe in case one wanted to only get the messages logged by this extension), logging to stdout or stderr may be now enabled directly:

sqlite> .log off
sqlite> SELECT * from chemicalite_settings;
logging|disabled
sqlite> UPDATE chemicalite_settings SET value='stderr' WHERE key='logging';
sqlite> SELECT mol_mw('X');
Could not convert 'X' to mol

sqlite> 

I hope the above helps, and in case they are useful, some experimental conda packages for linux and osx are currently available here (They are based on conda-forge's rdkit, and might not be binary compatible with other distribution channels - osx is only barely tested at build time. Both the sqlite3 and apsw from conda-forge apparently are enabled to load extension, so you might not need to build everything from source code).

rmcgibbo commented 3 years ago

Brilliant.