rvianello / chemicalite

An SQLite extension for chemoinformatics applications.
https://chemicalite.readthedocs.org/en/latest/
BSD 3-Clause "New" or "Revised" License
55 stars 8 forks source link

sqlite3.ProgrammingError: Error binding parameter 2: type 'Mol' is not supported #11

Closed lukasturcani closed 1 year ago

lukasturcani commented 1 year ago

Hi! Thank you for this library!

I'm trying to write an rdkit molecule directly into the database -- here is the minimum reproducible example (Python 3.11)

import sqlite3

import rdkit.Chem as rdkit

connection = sqlite3.connect("mydb.sql")
connection.enable_load_extension(True)
connection.load_extension("chemicalite")
connection.enable_load_extension(False)

mol = rdkit.MolFromSmiles("CCC")
connection.execute(
    "CREATE TABLE IF NOT EXISTS molecules(id INTEGER PRIMARY KEY, molecule MOL)"
)
connection.executemany(
    "INSERT INTO molecules(id, molecule) VALUES (?1, ?2)",
    ((1, mol),),
)

The error is get is

sqlite3.ProgrammingError: Error binding parameter 2: type 'Mol' is not supported

Since my table has the form molecules( ... molecule Mol) -- I would have assumed that you can just add an rdkit Molecule. The documentation (https://chemicalite.readthedocs.io/en/latest/api.html) describles mol as an rdkit molecule type. Is there a distinction between rdkit molecules in the Python vs SQL context? If so, I would say that it is not obvious from the docs (could be my bad for failing to read them thoroughly enough -- apologies if so!)

rvianello commented 1 year ago

Hi @lukasturcani, thank you for your interest in chemicalite :-)

I think it's correct to say that a chemicalite mol corresponds to an RDKit molecule, but the value stored/persisted in the database of course must be a serialized form of the represented runtime instance. The error you experienced therefore occurs because the database driver and api you are using do not implement a mechanism to convert a runtime python Mol object into data that can be processed by the sqlite database (a similar mechanism is part of the database client code and it can't be implemented as part of the sqlite extension).

The simple solution to this problem consists in wrapping the I/O operations related to a mol column in functions that can convert the internal molecule representation into anyone of the supported chemical formats. You could for example insert the molecule for its smiles representation directly:

In [7]: connection.executemany(
   ...:     "INSERT INTO molecules(id, molecule) VALUES (?1, mol_from_smiles(?2))",
   ...:     ((1, "CCC"),),
   ...:     )
Out[7]: <sqlite3.Cursor at 0x7f52fb31d0c0>

[...]

In [9]: for r in connection.execute("SELECT id, mol_to_smiles(molecule) FROM molecules").fetchall(): print(r[0], r[1])
1 CCC

I hope the above helps answering your question.

rvianello commented 1 year ago

I'm closing this ticket as I think the question is answered. Please re-open it (or create a new one) if needed.

lukasturcani commented 1 year ago

That's super helpful! Thank you very much!