kellnerd / harmony

Music Metadata Aggregator and MusicBrainz Importer
MIT License
53 stars 7 forks source link

Optimize SQLite DB to improve cache performance #52

Closed kellnerd closed 4 months ago

kellnerd commented 4 months ago

The amount of cached data is growing and the underlying SQLite database of my SnapStorage library is likely becoming a bottleneck. First observed 3 weeks ago:

What is concerning me more are the processing times, each provider takes about 1000 ms while these durations should go down to about 10-20 ms for cached results… This generally seems to happen for all permalinks right now, but most are “only” taking 300-400 ms to process, I will investigate. Edit: Restarting the app did not help unfortunately. Using my local server brings the processing times down to about 20 ms once the API results are cached. The main difference is that my local server’s snapshot directory only contains 25M of data (400k sqlite DB) while the pulsewidth server uses 150M already (3M sqlite DB), but the performance can’t scale that badly!?

Latest numbers: 476M of data (compressed), 9.7M / 18.3M SQLite DB (compressed / uncompressed), 79k rows in uri, 82k rows in snap

See https://github.com/kellnerd/snap_storage/issues/2 for details.

kellnerd commented 4 months ago

I have deployed the SnapStorage update (https://github.com/kellnerd/harmony/commit/2db91de5b61fc64ca3205da2d4d62b1869f2775a) in production. It was a full success, cache response times are about 100(!) times faster now in some cases :rocket:

Before: Before

After: After