[important] optimize rcl-import-results

rfmvh / perfevents-result-collector

GNU General Public License v3.0

1 stars 1 forks source link

[important] optimize rcl-import-results #6

Open frozenstein opened 7 years ago

frozenstein commented 7 years ago

The rcl-import-results parses a csv-like file with results and per-line reports results to the DB (one line = one event result; event name + its value). This makes it terribly slow when the result file has hundreds of lines (this happens in real life, however, we didn't expect it to be that bad).

It would be great if the tool parsed the file first, saved the matching lines to a list or something, then constructed a single query (or e.g. a query per one hundred of records).

The time saving is more noticeable when the DB server is not localhost, so on localhost, a bugfix verification would be not so clear.

mpavlase commented 7 years ago

From the first sight I have an idea where to start to optimize it. By SQL you can insert multiple rows at once by INSERT INTO events VALUES (val1), (val2), (val3); Currently: sql_query_insert = 'INSERT INTO events (name) VALUES (%(event_name)s);'adds one row and do immediately database commit. I would recommend to split (optionally) it and do commit only after loop with insert events. This can significantly help, especially the final commit change.

frozenstein commented 7 years ago

Yes, either by commiting after multiple inserts (that would require redesigning the dbinterface.py a bit, since commit is encapsulated there. Second possibility is grouping multiple values (rows) per insert which is what I would probably choose.

mpavlase commented 7 years ago

Add an argument to skip commit call and call it afterwards will be easy change as well.

frozenstein commented 7 years ago

Currently, there are methods query and select (which does not do commit). However, I think grouping rows to a single query will make it faster even more.

mpavlase commented 6 years ago

@ocasek try to solve this issue by encapsulate all "insert" queries into one database transaction and do commit at the end. If you have any questions, feel free to ask.

mnecas commented 6 years ago

9fae725 Insert with multiple values is done but one problem... it doesn't check if what you put into db is already there. Should I add it?

mpavlase commented 6 years ago

I've put several comments to the commit with proposals for further changes (and that's why I've removed "task done" flag). I think it's not necessary to do this duplication check. @frozenstein what's your opinion?

frozenstein commented 6 years ago

What kind of dups are you talking about? The rows in the results table cannot be dups. Even two completely identical rows in the results table are two valid results. If I measure one particular thing three times and always get 0, there will be three identical rows, all valid.

frozenstein commented 6 years ago

Or do you mean the auxiliary stuff, like kernels, etc?