Closed graue closed 9 years ago
So far it seems that DB commits are very expensive. (By removing the db.session.commit
in aggregate_metadata
, I got initial creates down to 2.07 sec average, though this causes incorrect behavior — see #30. Updates were not significantly affected.) In general I suspect looking up the album for every track is holding us back, and maintaining a Python dict of known albums may help.
I pushed a branch called createdb-baseline
which is the old pre-overhaul commit I referred to above, with a status line and timer added.
I realize my overhaul is not ideal. Don't worry about calling it bad when you have the numbers to prove it is bad :)
My thought is we need to use sqlalchemy better than we do now. There shouldn't be a commit within aggregate_metadata
, but without it we have to figure out some way of creating albums within the session. (When I was trying this initially it wouldn't find the albums unless I committed each one)
Perhaps if we pass the session? Like, db.session.add(album)
and return db.session from aggregate_metadata
.
I'm not sure.. Maybe aggregate_metadata
was ill conceived to begin with.
Eh. I just run updates overnight now. Not the end of the world. Will create a new issue if I have concrete ideas I feel like trying out.
tl;dr: they're too slow, make 'em fast. (Original rant follows:)
I just added a commit which prints time taken for create/update using this method:
I tested the time to process 664 tracks on the local SSD on a late-2012 MacBook Air 13". I compared the post-DB overhaul commit 30dff9d, "Print time taken to process a DB update", run with
DEBUG=True ./manage.py update
, and the pre-DB overhaul commit e5dad7a "Fix broken DB_URI option", run withDEBUG=True ./manage.py createdb
. (To the old commit, I added a timer that works the same as above.) All figures are the average of 5 trials.So modest speedup (27%) in the case where nothing changed, but 500% worse for initial create! This is IMHO obviously not a good tradeoff. When nothing's changed we should get a much better speedup than 27%.
I'd like updates with no changes to be at least 50% faster than pre-overhaul creates, and initial creates to be at most 25% slower. Future changes should be guided by speed testing and profiling.