Closed mohamed-barakat closed 9 months ago
I don't think this relaxation makes sense without collecting more information first: The errors in the CI might be due to timing fluctuations, but they might also be due to small performance regressions introduced over time. I suggest to either:
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
c13d54e
) 78.97% compared to head (697642d
) 75.47%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I don't think this relaxation makes sense without collecting more information first: The errors in the CI might be due to timing fluctuations, but they might also be due to small performance regressions introduced over time. I suggest to either:
- Remove the test and call it a day.
- Remove the test from the CI but execute it manually every time the compiled code changes.
- Manually compare the timings at the time when this test was added to the current timings. If there is a regression, investigate. If not, then the relaxation actually makes sense.
I never encountered a regression on my laptop. And since the compiled code did not change for a while I believe that the 7 / 10
is too tight when considering the fluctuations of a CI running on a virtual machine.
On my laptop this test-line fails almost 60% of the times and setting the quotient to 8/10
would allow tests to pass. Since this PR is created, I would suggest merging it, and investigate further if the issue remains.
On my laptop this test-line fails almost 60% of the times and setting the quotient to
8/10
would allow tests to pass. Since this PR is created, I would suggest merging it, and investigate further if the issue remains.
I also see that the 4 / 10
test also sporadically fails. I would wait a bit longer and maybe relax both.
On my laptop this test-line fails almost 60% of the times and setting the quotient to
8/10
would allow tests to pass. Since this PR is created, I would suggest merging it, and investigate further if the issue remains.
Ah, that's a different situation than CI failures now and then. Could you
AdelmanCategoryOfAdditiveClosureOfAlgebroidVsAdelmanCategoryOfQuiverRows.g
5 times and report the numbers runtime
and runtime_quiver
,Then we know which range of results to expect and if there has been a regression since beginning of this year.
I also see that the
4 / 10
test also sporadically fails. I would wait a bit longer and maybe relax both.
At the event just now :)
if runtime <= runtime_quiver * 4 / 10 then Display( true ); else Display( runtime ); Display( runtime_quiver ); fi;
# Expected output:
true
# But found:
2027
4896
On master f0654eb56d2a37b0bdd12ea32ddcfef1b02ff3e3
[ [ 17329, 34929 ], [ 20397, 28526 ], [ 18249, 28558 ], [ 19977, 31687 ], [ 20543, 29235 ] ]
producing the quotients
[ 0.496121, 0.715032, 0.639015, 0.630448, 0.702685 ]
On commit 31de779bda22f02b0890865ce76c646b883116af
[ [ 17102, 29659 ], [ 21527, 34266 ], [ 21474, 28003 ], [ 19019, 28918 ], [ 16074, 28570 ] ]
producing the quotients:
[ 0.576621, 0.628232, 0.766846, 0.657687, 0.562618 ]
On master f0654eb
[ [ 17329, 34929 ], [ 20397, 28526 ], [ 18249, 28558 ], [ 19977, 31687 ], [ 20543, 29235 ] ]
producing the quotients[ 0.496121, 0.715032, 0.639015, 0.630448, 0.702685 ]
On commit 31de779
[ [ 17102, 29659 ], [ 21527, 34266 ], [ 21474, 28003 ], [ 19019, 28918 ], [ 16074, 28570 ] ]
producing the quotients:[ 0.576621, 0.628232, 0.766846, 0.657687, 0.562618 ]
Wow, those numbers are really unstable. I guess this might be due to the P and E cores of your CPU, with results changing depending on which cores run which part of the test.
Some numbers from my laptop for comparison:
[ [ 27027, 37421 ], [ 26355, 37113 ], [ 25695, 36010 ], [ 24504, 34867 ], [ 24439, 34402 ] ]
producing the quotients
[ 0.72, 0.71, 0.71, 0.70, 0.71 ]
Here, all numbers are very close to each other.
The intention of this test was to capture small regressions of < 10%. To accommodate for the above numbers, we would have to relax the bounds to maybe 45% and 75% or even more, which means we only notice regressions > 30%. Such regressions should also be noticeable elsewhere, so I don't think the test is very useful then. Hence, I suggest to simply disable this test.
I have now also tested 31de779bda22f02b0890865ce76c646b883116af and get the following results:
[ [ 23363, 34735 ], [ 23123, 34800 ], [ 23362, 33842 ], [ 22077, 33028 ], [ 21984, 32944 ] ]
producing the quotients
[ 0.67, 0.66, 0.69, 0.66, 0.66 ]
So there indeed has been a regression since the beginning of this year and the CI failures have pointed out a valid problem, which would have gone unnoticed without the very strict bounds.
The absolute values of the timings show that there is definitely a regression > 10%:
On master https://github.com/homalg-project/CategoricalTowers/commit/c1711cb6d4c1274a7e2eb41930f1bac83e9f76b9
[ [ 11039, 15411 ], [ 11041, 15814 ], [ 10853, 15671 ], [ 10845, 15607 ], [ 11008, 15566 ] ]
producing the quotients
[ 0.716307, 0.698179, 0.692553, 0.694881, 0.707182 ]
On commit https://github.com/homalg-project/CategoricalTowers/commit/31de779bda22f02b0890865ce76c646b883116af (https://github.com/homalg-project/CAP_project/commit/3a6f758af8b51895f5206f7368c38dc80704cc26, https://github.com/homalg-project/FinSetsForCAP/commit/9f23f456b2608a802519f80575701a865457cc00)
[ [ 9764, 14761 ], [ 9855, 14687 ], [ 9886, 15026 ], [ 9681, 14680 ], [ 9771, 14646 ] ]
producing the quotients:
[ 0.661473, 0.671002, 0.657926, 0.659469, 0.667145 ]
It is very hard to bisect having three different repos, any ideas?
It is very hard to bisect having three different repos, any ideas?
I have once written a script for this situation: it checks out all git repos in a directory with the last commit before a given date and time:
#!/bin/bash
set -e
if [ "$#" -lt 1 ]; then
echo "you must pass at least one arguments"
exit 1
fi
if [ "$#" -gt 2 ]; then
echo "you must pass at most two arguments"
exit 1
fi
DATE="$1"
if ! [[ "$DATE" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]; then
echo "Year, month and day must be given as \"YYYY-MM-DD\""
exit 1
fi
if [ "$#" -eq 2 ]; then
TIME="$3"
else
# include the whole day by default
TIME="23:59"
fi
if ! [[ "$TIME" =~ ^[0-9]{2}:[0-9]{2}$ ]]; then
echo "Time must be given as \"HH:MM\""
exit 1
fi
# append 59 seconds to always include the whole minute
TIME="$TIME:59"
for repo in *; do
cd "$repo"
commit=$(git rev-list --first-parent -n1 --before "$DATE $TIME" origin/master)
current_commit=$(git rev-parse --verify HEAD)
if [[ "$commit" == "$current_commit" ]]; then
modified=""
else
modified="+"
fi
printf "%-30s %1s %s\n" "$repo" "$modified" "$(git show --date=iso-local -s --format="%cd %h %s" "$commit")"
git checkout -q "$commit"
cd ..
done
It is very hard to bisect having three different repos, any ideas?
I have once written a script for this situation: it checks out all git repos in a directory with the last commit before a given date and time:
Wonderful, thank you very much.
see: https://github.com/homalg-project/CategoricalTowers/pull/444#issuecomment-1819187002