official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.76k stars 2.3k forks source link

Multicore regression tests? #2094

Closed NightlyKing closed 5 years ago

NightlyKing commented 5 years ago

As of now Fishtest only tracks single core regression at LTC.

For multicore tests there is NCM dev but the problem I see is that SF dev is playing SF7 which might not be the best opponent to catch regressions on multiple cores - let alone that it "only" plays 20k games for each version with 2 cores only.

I suggest doing a real regression test on 4 or 8 cores once in a while. Maybe each fourth regression test? Or when 5 patches which targets multiple cores passed? The reason for me to suggest this is that today most tournaments are held on multiple cores to increase strength and I find NCMs way's lacking value.

-Ente

Alayan-stk-2 commented 5 years ago

I'd be interested by multicore regression tests to get a better idea of SF's progress over versions at medium thread counts (say 8) compared to baseline single-core performance changes, but such tests would consume a huge amount of fishtest resources. Especially if we want to get some historical data by doing this from SF7 onwards.

Maybe with very low priority...

MagaTailor commented 5 years ago

Rating lists already provide those data, e.g. CCRL in the 'Complete list' tab

NightlyKing commented 5 years ago

Rating lists already provide those data, e.g. CCRL in the 'Complete list' tab

Well. If we look at 40/4 right now we see this:

Stockfish 10 64-bit 4CPU | 3547 | +15 | −15

I'm sorry but "Stockfish 10" isn't the dev version. Neither is a +-15 error bar something I'd consider reliable or suitable to catch a regression. CCRL plays a lot of games but most of them are not SF dev vs SF master games. Until one SF version has enough games we probably already have 5 more patches affecting multi cpu performance.

I don't know any other platform than Fishtest to reliably catch regressions on 4 or more cores.

Alayan-stk-2 commented 5 years ago

CCRL never have enough games for error bars to go down acceptably.

Alayan-stk-2 commented 5 years ago

One such test has been done here : http://tests.stockfishchess.org/tests/view/5cba4ebd0ebc5925cf020fae

30+0.3 with 8 threads ; same commit as last regular regression test. It shows the gain at 30+0.3 8th is about 8 (take or add 3) elo better than at 60+0.6 1th.

NightlyKing commented 5 years ago

One such test has been done here : http://tests.stockfishchess.org/tests/view/5cba4ebd0ebc5925cf020fae

30+0.3 with 8 threads ; same commit as last regular regression test. It shows the gain at 30+0.3 8th is about 8 (take or add 3) elo better than at 60+0.6 1th.

I think it goes to show that multicore progress/regression can be different to single core. Maybe we should test the version we send to tcec sufi with a 30+0.3 8th and a regular regression test to have another comparison.

MagaTailor commented 5 years ago

@Alayan-stk-2 You've achieved a similar result to what the complete CCRL list already showed, for example in the case of SF9 to SF10 single/multicore progression, only with better bars.

Why waste so much time and resources then?

Alayan-stk-2 commented 5 years ago

According to CCRL 40/40, SF7 to SF10 gained 136 elo and SF7-4th to SF10-4th gained 136 elo. Do you think you can conclude that ther has been zero improvement to how SF performs with multiple threads between SF7 and SF10 ?

4 cores is not that much in the first place (multithreading scaling issues are still limited), and with the error bars at CCRL being at 22 elo for SF10-4CPU and 10 or more elo for SF7, SF10 and SF7-4CPU, you can't know if there was really 0 progress or if in truth SF gained 20 elo at 4 threads...

Besides, I ran the test in a period of low activity on the framework which would have idled more otherwise.