rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
197 stars 55 forks source link

Use generator for stats_info instead of list #29

Closed i-tub closed 3 years ago

i-tub commented 3 years ago

The stats_info list used too much memory; when loading a table with 1000 structures and 7 properties, it took about 8 gb, and it scales quadratically.

We now use a stats_info generator, so the stats are inserted into the database as soon as they are generated, using a limited amount of memory.

An incompatible, but arguably cosmetic side effect is that the log messages are slightly less informative because we don't know how many rows of stats were generated before we start inserting them.

d-b-w commented 3 years ago

This change makes sense to me, but is there a test that runs this code?

i-tub commented 3 years ago

@d-b-w , I just added a follow-up commit that adds a test that exercises the function being modified.