Closed NightlyKing closed 5 years ago
I'd be interested by multicore regression tests to get a better idea of SF's progress over versions at medium thread counts (say 8) compared to baseline single-core performance changes, but such tests would consume a huge amount of fishtest resources. Especially if we want to get some historical data by doing this from SF7 onwards.
Maybe with very low priority...
Rating lists already provide those data, e.g. CCRL in the 'Complete list' tab
Rating lists already provide those data, e.g. CCRL in the 'Complete list' tab
Well. If we look at 40/4 right now we see this:
Stockfish 10 64-bit 4CPU | 3547 | +15 | −15
I'm sorry but "Stockfish 10" isn't the dev version. Neither is a +-15 error bar something I'd consider reliable or suitable to catch a regression. CCRL plays a lot of games but most of them are not SF dev vs SF master games. Until one SF version has enough games we probably already have 5 more patches affecting multi cpu performance.
I don't know any other platform than Fishtest to reliably catch regressions on 4 or more cores.
CCRL never have enough games for error bars to go down acceptably.
One such test has been done here : http://tests.stockfishchess.org/tests/view/5cba4ebd0ebc5925cf020fae
30+0.3 with 8 threads ; same commit as last regular regression test. It shows the gain at 30+0.3 8th is about 8 (take or add 3) elo better than at 60+0.6 1th.
One such test has been done here : http://tests.stockfishchess.org/tests/view/5cba4ebd0ebc5925cf020fae
30+0.3 with 8 threads ; same commit as last regular regression test. It shows the gain at 30+0.3 8th is about 8 (take or add 3) elo better than at 60+0.6 1th.
I think it goes to show that multicore progress/regression can be different to single core. Maybe we should test the version we send to tcec sufi with a 30+0.3 8th and a regular regression test to have another comparison.
@Alayan-stk-2 You've achieved a similar result to what the complete CCRL list already showed, for example in the case of SF9 to SF10 single/multicore progression, only with better bars.
Why waste so much time and resources then?
According to CCRL 40/40, SF7 to SF10 gained 136 elo and SF7-4th to SF10-4th gained 136 elo. Do you think you can conclude that ther has been zero improvement to how SF performs with multiple threads between SF7 and SF10 ?
4 cores is not that much in the first place (multithreading scaling issues are still limited), and with the error bars at CCRL being at 22 elo for SF10-4CPU and 10 or more elo for SF7, SF10 and SF7-4CPU, you can't know if there was really 0 progress or if in truth SF gained 20 elo at 4 threads...
Besides, I ran the test in a period of low activity on the framework which would have idled more otherwise.
As of now Fishtest only tracks single core regression at LTC.
For multicore tests there is NCM dev but the problem I see is that SF dev is playing SF7 which might not be the best opponent to catch regressions on multiple cores - let alone that it "only" plays 20k games for each version with 2 cores only.
I suggest doing a real regression test on 4 or 8 cores once in a while. Maybe each fourth regression test? Or when 5 patches which targets multiple cores passed? The reason for me to suggest this is that today most tournaments are held on multiple cores to increase strength and I find NCMs way's lacking value.
-Ente