Open frozenstein opened 7 years ago
From the first sight I have an idea where to start to optimize it. By SQL you can insert multiple rows at once by INSERT INTO events VALUES (val1), (val2), (val3);
Currently: sql_query_insert = 'INSERT INTO events (name) VALUES (%(event_name)s);'
adds one row and do immediately database commit. I would recommend to split (optionally) it and do commit only after loop with insert events.
This can significantly help, especially the final commit change.
Yes, either by commiting after multiple inserts (that would require redesigning the dbinterface.py
a bit, since commit
is encapsulated there. Second possibility is grouping multiple values (rows) per insert which is what I would probably choose.
Add an argument to skip commit
call and call it afterwards will be easy change as well.
Currently, there are methods query
and select
(which does not do commit).
However, I think grouping rows to a single query will make it faster even more.
@ocasek try to solve this issue by encapsulate all "insert" queries into one database transaction and do commit at the end. If you have any questions, feel free to ask.
9fae725 Insert with multiple values is done but one problem... it doesn't check if what you put into db is already there. Should I add it?
I've put several comments to the commit with proposals for further changes (and that's why I've removed "task done" flag). I think it's not necessary to do this duplication check. @frozenstein what's your opinion?
What kind of dups are you talking about? The rows in the results
table cannot be dups. Even two completely identical rows in the results
table are two valid results. If I measure one particular thing three times and always get 0, there will be three identical rows, all valid.
Or do you mean the auxiliary stuff, like kernels
, etc?
The
rcl-import-results
parses a csv-like file with results and per-line reports results to the DB (one line = one event result; event name + its value). This makes it terribly slow when the result file has hundreds of lines (this happens in real life, however, we didn't expect it to be that bad).It would be great if the tool parsed the file first, saved the matching lines to a list or something, then constructed a single query (or e.g. a query per one hundred of records).
The time saving is more noticeable when the DB server is not localhost, so on localhost, a bugfix verification would be not so clear.