Open venupec opened 6 years ago
Thanks very much for reporting this. Did you manage to get any further with it? If you still think it's an IO speed problem that can be reasonably solved in hardware then I'm happy to close the ticket. But if you think there's a real problem here then we should investigate further.
I'm aware that merging the DBs can be quite slow and can use a fair bit of memory. Fixing that would probably require a fundamental change to the way the coverage DB is structured. Perhaps by using a real DB. But obviously that's a fair bit of work.
A query-able sqllite backend? Yes please! That would significantly lower the effort to mine the data, basically opening it up.
One piece of advice aka premature optimization: Start without indexes. Create them after the coverage is gathered but before you generate the report. Inserting without indexes is much faster than inserting with indexes - especially once your data set grows enough for the associated trees to grow deep.
But looking at the code, this would be a very significant rewrite indeed - especially if we are to realize the potential in doing such a change besides the inital retionale (faster reports and merges).
I had some luck in speeding up a Devel::Cover coverage run for a large codebase by specifying JSON as an output format instead of Sereal. The parser for the cover output can read this output much faster than the Sereal database; this is important for repositories with a large number of perl files to cover. It's also much easier to create your own parser -- I wrote one in Golang and one in Rust for a code analysis tool for work.
I could probably re-write and improve the golang script pretty easily. I'll look into doing that for faster parsing of JSON outputted runs.
We have three options at the moment for storing the coverage DB: Serial, JSON and Data::Dumper. I had assumed Serial to be the fastest and most efficient format and so used that by default if it was available. Are we saying that it's faster for Devel::Cover to use JSON (which module?) than Serial?
UPDATE:
It is probably the IO issue reading all digest files. I plan to acquire more powerful machines and test.
The cover report spends most of the time -
Devel::Cover::DB::cover()
. It is the @runs loop that takes up almost 80-90%.The cover text report files are 28MB for each test suite. Some are of smaller size too like 6MB.
Hello,
We run Devel::Cover on some long running harness test suites that runs into couple of days. We're seeing performance issue with
cover <db> -report text
command.Issues I'm seeing:
Our harness test suite set up:
DEVEL_COVER_OPTIONS
env var.What i tried to improve performance?
Did quick benchmark on
print_statement()
&print_subroutine()
inDevel::Cover::Report::Text::report()
. The results were not that bad, at least from what i've seen so far. It took about 3 minutes in total to generate report for each of the 13 test suites.I tried to generate JSON report, but that report doesn't provide 'covered' or 'uncovered' modules information. So that's of no use for us. But i've customized the code to include covered modules list as well. Still the performance has not improved.
Did any one see these kind of issues before? I'm clueless on what other optimizations i could do on cover command.
I really appreciate your help/insight into this issue. I'm happy to supply additional data supporting the stats above.
Thank you!