Closed ticalc-travis closed 10 years ago
I'd be willing to help with this!
Something that works just like the shelve classes, which in turn are designed to mimic a normal Python dict (probably just subclass a Shelf and make it use whatever as the DB backend) so I don't have to rewrite any high-level code would be ideal. Then, assuming updates would be quick enough, I could change my update scripts to also use the classes for updates (and remove/change the code for directly manipulating DBM files in the filesystem, since I assume things would work differently here).
Some initial performance tests doing a bot-chat from a Postgres DB look rather promising. The downside is that populating the DB is still excruciatingly slow, despite my tweaking. I guess it's time I actually come up with a thing to incrementally update data from new IRC logs rather than doing it the dumb way and regenerating it all from scratch every time. :-P
Done, barring any bugs that may crop up sooner or later.
The default shelve module's DBM is rather slow. Not a problem normally, but generating bot chats in particular can take ages, and updating the Markov DB is too slow (currently am loading it all in RAM for writing as a Python dict instead, which swaps everything out on my box even with 12 GB). Maybe using some full-fledged DB backends like PostgreSQL would allow better performance.