Closed stevelee12 closed 4 days ago
@isc-tleavitt are you or a colleague able to shed any light on this please? Thanks 👍
@stevelee12 sorry for the delay. What IRIS version are you running on? Can you paste in the query plan you're getting on your system for the slow query?
One possible thought here - in our CI processes we do this before each build to clear out previous data (note, this will delete EVERYTHING from previous TestCoverage runs):
do ##class(TestCoverage.Utils).Clear()
Running that could help with performance if past runs' data are a factor.
As a comparison point, I'm seeing this performance on one of our larger applications with a low-resourced build machine running IRIS for UNIX (Red Hat Enterprise Linux 8 for x86-64) 2022.1.2 (Build 574U) Fri Jan 13 2023 14:58:02 EST:
Collecting coverage data for all tests: 13.699757 seconds
Mapping to class/routine coverage: 4.725704 seconds
Aggregating coverage data: .119715 seconds
Code coverage: 23.78%
Codebase size is fairly comparable (not smaller enough to explain a 250x slowdown - and we have much higher coverage too):
select count(*) from TestCoverage_Data.Coverage
union all
select count(*) from TestCoverage_Data.CodeUnitMap
union all
select count(*) from TestCoverage_Data.Coverage_RtnLine
Gives:
1736
246907
380051
The operative query:
SELECT count(*) FROM TestCoverage_Data.Coverage source JOIN TestCoverage_Data.CodeUnitMap map ON source.Hash = map.FromHash JOIN TestCoverage_Data.Coverage_RtnLine metric ON metric.Coverage = source.ID AND metric.element_key = map.FromLine JOIN TestCoverage_Data.Coverage target ON target.Run = source.Run AND target.Hash = map.ToHash AND target.TestPath = source.TestPath LEFT JOIN TestCoverage_Data.Coverage_RtnLine oldMetric ON oldMetric.ID = target.ID AND oldMetric.element_key = map.ToLine WHERE source.Run = ? AND source.Ignore = 0 AND source.Calculated = 0 GROUP BY target.ID,map.ToLine
Returns in under a second with query plan:
• Read index map TestCoverage_Data.Coverage.MeaningfulCoverageData, using the given Run, Calculated, and Ignore, and looping on Hash and %SQLUPPER(TestPath), and getting ID.
• For each row:
- Read master map TestCoverage_Data.Coverage_RtnLine.IDKEY, using the given Coverage, and looping on element_key.
- For each row:
· Read index map TestCoverage_Data.CodeUnitMap.HashForward, using the given FromHash and FromLine, and looping on ToHash and ToLine.
· For each row:
- Read index map TestCoverage_Data.Coverage.UniqueCoverageData, using the given Run, Hash, and %SQLUPPER(TestPath), and getting ID.
- Check distinct values for ToLine and ID using temp-file A,
subscripted by values.
- For each distinct row:
· Add a row to temp-file A, subscripted by the hash,
with node data of ID and ToLine.
- Update the accumulated count(rows) in temp-file A,
subscripted by the hash
IRIS for UNIX (Ubuntu Server LTS for x86-64 Containers) 2022.1.5 (Build 940U) Thu Apr 18 2024 14:30:11 EDT The container spins up fresh, installs the test coverage package and executes tests with coverage so I wouldn’t think there’s anything to clear but I’ll try it
Not sure if it’s relevant but the code coverage I’m analysing is 100% routine .mac classes rather than .cls’
@stevelee12 can you snag the query plan and see if it's the same?
Before executing unit tests:
running the tests now...
I forgot to add, I tried running the SQL on terminal. Query executes but when I try to do RS.Next() on the first row it hangs
still going..
I quit the tests early with ctrl+c, the query plan is still the same as above, but executing it will not return as yours does. Happy to show you on a Teams call on Monday or any day next week if you're available?
@stevelee12 please drop me an email: tleavitt <at> intersystems.com
- we'll set something up.
Before executing unit tests:
running the tests now...
This query plan is meaningfully different and I think I see the bad choice: for each routine line we're looping over all of the hashes for the given run and test path! That's a lot of silly extra work.
TuneTable isn't much help here because we're starting out from nothing, but we might be able to trick the query optimizer in the right direction with a %IGNOREINDEX pointer. Unfortunately, we need to use TestCoverage_Data.Coverage.MeaningfulCoverageData on the outer loop. The best possibility/hope would be that ignoring TestCoverage_Data.CodeUnitMap.HashReverse would get it to use HashForward and do so first.
Ah - actually we can use %NOINDEX in ON too, just thought to look for that: https://docs.intersystems.com/iris20221/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_join#RSQL_join_performance_on_indexing
@stevelee12 - rather than meeting, I'm asking @isc-shuliu to put up a PR with the query optimizer keywords to fix the issue; if that doesn't resolve it we can meet.
Proper optimization strategy:
Rewrite the query to change the join order to: Coverage Coverage_RtnLine CodeUnitMap Coverage
And use the %INORDER query optimizer hint.
This looks very promising!
@stevelee12 thank you for confirming! I've merged and we'll release 4.0.5 today.
@stevelee12 we've released 4.0.5 here and via Open Exchange/IPM.
Hi @isc-tleavitt Can you check the 4.0.5.xml release please? I could be going mad but I dont think the xml in MapRunCoverage query in there matches what's in Git?
@stevelee12 you're completely right - filed #58 to fix this. There's a new artifact, that'll be the right one.
As per subject, the "Mapping to class/routine coverage" process is taking a long time to complete. Running locally takes between 15-20 minutes on average (over 50mins in Azure DevOps).
Run locally:
I've put a couple of debug lines at various points in TestCoverage.Data.Run.MapRunCoverage() to track timings, flagged with the original COS comments where possible, here's my findings:
Size of tables:
Running the sqlstatement as a straight count(*) without the insert on SMP just sits waiting forever:
However when I remove the join back to "TestCoverage_Data.Coverage target" the query returns instantly
Can anyone help me with this please?
Thanks as always :)
EDIT: The straight count did eventually return a result after 46min: