Use SQLAlchemy to work with the database

I developed a prototype using SQLAlchemy to generate the new fragment database instead of using hand-built SQL queries

My test set was 100 SMILES. With low-level SQL queries and 4 processors it takes ~28.6 seconds. Going through SQLAlchemy's ORM took ~37.5 seconds -- 30% slower.

Profiling shows that most of the additional time is in session.commit(). However, even if I comment out the session.add() code (which tells SQLAlchemy to add the new objects to the database when appropriate), the overall time was still slower than the hand-written code.

Further profiling shows about 15% of the additional time was in _initialize_instance. This is part of SQLAlchemy's ORM. I replaced mmpdb's existing fragment types with slightly modified versions that inherit from registry.generate_base() base class. This has its own __init__, with a higher overhead than the basic Python-class-with-slot-definition I used.

There were some improvements to the API. I got to strip out some boilerplate code to return a FragmentRecord from the cache file. On the other hand, I had to lean how to "detach" objects to move them from one database to the other, and I had to learn the special query syntax.

When we get to things like merging multiple datasets into one, I think it would be easier to work directly to the SQLite connection object, attach databases, in the instance, and do cross-data INSERT with SELECT. I don't want to have to figure it out in SQLAlchemy.

Long-term, I think there's still a place for SQLAlchemy, as with kzfm's example.

The most likely is to define the database schema(s) in a better way than my shaky template system, and use it to manage schema creation, and to replace the vendored peewee code we use now, and use the more low-level SQLAlchemy calls to write the tables, rather than SQLAlchemy's ORM.

rdkit / mmpdb

Use SQLAlchemy to work with the database #38