Stockfish submissions to TCEC

xoto10 commented 5 years ago

@mcostalba @snicolet No sf version has been submitted to this season's TCEC Division Premier which is about to start. Apparently they emailed us (I'm not sure who) on Wednesday and haven't received a reply.

I have suggested they use a recent version from abrok, I suggested this one:

Author: Marco Costalba
Date: Sat Apr 6 02:03:15 2019 +0200
Timestamp: 1554508995

Fix a missing assignment in previous commit  ...

as the last one before their deadline.

Can you contact TCEC to conifrm this is the one to use please?

Can we name someone to take charge of TCEC submissions (@Alayan-stk-2 :-) ?), preferably with someone else as backup. Or officially ask them to just use the latest abrok version each time, so that we're not relying on unreliable communications?

xoto10 commented 5 years ago

The main question right now is, which version should TCEC use in Div P starting tomorrow?

mcostalba commented 5 years ago

I'm absolutely ok to assign someone to take care of tournament submission.

I would avoid to indicate a name myself. If Stephane agrees, people could submit their availability to take the role here in this thread, and a decision will be taken according to the community feedback (at the moment I prefer not indicate strict election rules, if needed it will be better formalized).

mcostalba commented 5 years ago

Regarding tomorrow's tcec, there's no reason now to avoid current Master.

xoto10 commented 5 years ago

For the current TCEC, they have received an email agreeing a version, so the urgent part is done now :-)

Alayan-stk-2 commented 5 years ago

Having someone assigned to handle version submission to TCEC would be of limited use if that person has no authority to answer anything but "use the latest abrok exec from before the deadline".

Decisions should be mindful of community input, but e.g. a person assigned to submissions should be able to ask for non-default contempt or other meaningful parameters changes we'd have good reasons to believe would help.

mcostalba commented 5 years ago

@Alayan-stk-2 I agree with you on this point. This role should have some flexibility in picking the binary, as long as the engine is still the official Stockfish and not something else. It should be an engine, not necessary from the current master, but IMO picked from this official SF repository. The flexibility I see is regarding the commit from which to pick, the compiled binary and the choice of UCI parameters values.

xoto10 commented 5 years ago

@mcostalba @Alayan-stk-2 The next TCEC deadline is for submission for the TCEC Cup, I believe the deadline is the end of the current premier division, which is likely to finish in around 7 days, probably sometime on 2019-04-27.

Alayan-stk-2 commented 5 years ago

We need to submit a version today.

Aloril42: xoto10 alayant about 18h left until DivP is over.

mcostalba commented 5 years ago

I have merged https://github.com/official-stockfish/Stockfish/pull/2108 for TCEC.

Nobody stepped up, so I think I can close this one.

mcostalba commented 5 years ago

I have submitted this one to TCEC:

http://abrok.eu/stockfish/builds/9a11a291942a8a7b1ebb36282c666ca8d1be1892/win64bmi2/stockfish_19042711_x64_bmi2.exe

Alayan-stk-2 commented 5 years ago

@mcostalba I didn't step up as I viewed this as giving some duty to be available and handle these matters, and I can't always be.

However, it seems that each time SF is to play in a new bonus/div at TCEC, the TCEC people have some trouble getting info on updates, and this would be less of an issue if I could confirm updates.

Aloril42: @alayant Could you be/become person who sends tcecSF updates for TCEC? (Would be basically making sure latest is OK and sending email about it.. and parameters if they are changed) alternatively @xoto10 could be too though not seen here, seen in discord though.

Right now, I'm not appointed to do so, so even when I'm available and am sure of what we should send, my word isn't quite enough.

Would anybody mind ?

mcostalba commented 5 years ago

@Alayan-stk-2 please forgive me but I am not able to parse: "Right now, I'm not appointed to do so, so even when I'm available...".

This PR was open to ask if someone would step up. Nobody showed. I close it. If now something has changed I am glad to re-open.

So I ask again:

We are looking for somone to take care tournament submission. This role should have some flexibility in picking the binary, as long as the engine is still the official Stockfish and not something else. It should be an engine, not necessary from the current master, but picked from the official SF repository. The flexibility is regarding the commit from which to pick, the compiled binary and the choice of UCI parameters values.

If someone is interested, please, simply do write: "Yes, I candidate myself for this role".

Alayan-stk-2 commented 5 years ago

Yes, I candidate myself for this role :slightly_smiling_face:

mcostalba commented 5 years ago

ok, i will leave this open few more days to see if someone else shows up.

Vizvezdenec commented 5 years ago

I'm okay with alayant being a submitter

Alayan-stk-2 commented 5 years ago

It has been a week now, and nobody else has stepped up or voiced opposition. So we can proceed further I presume ?

mcostalba commented 5 years ago

@Alayan-stk-2 sorry for my late reply!

Yes, of course. I'm ok with @Alayan-stk-2 as submitter for TCEC.

Alayan-stk-2 commented 5 years ago

@mcostalba Thanks.

I assume I could also handle submissions to CCC ? This issue mentions mostly TCEC so coolchess would like a confirmation. I assume that if I'm trusted for one, I can be for the other. :slightly_smiling_face:

mcostalba commented 5 years ago

@Alayan-stk-2 yes, this is good for CCC too

snicolet commented 5 years ago

@Alayan-stk-2 I have received the following email from Anton Mihailov (TCEC):

Premier Division of TCEC Season 16, where your engine participates, is going to start soon. Last move of bonus gauntlet after League 1 Playoff is deadline for updates. Please, send us your latest stable version and any additional information that you would like the admin to know.

Keep Aloril and Kan on CC. They are top professionals and will keep the smooth running of TCEC as usual.

For information about the season check it out at this link http://www.chessdom.com/tcec-season-16-information-and-participants/ . All the best in S16!

Alayan-stk-2 commented 5 years ago

@snicolet Don't worry, I'm keeping an eye on it.

mstembera commented 4 years ago

FYI https://github.com/official-stockfish/Stockfish/issues/2642

mstembera commented 4 years ago

@Alayan-stk-2 I recently found out that TCEC now allows specifying up to 128GiB of hash and other engines are making use of it. SF is still set to 64GiB. I watched the Livelog for a bit while SF was playing and hashfull was over 70% quite often. More than 50% is usually sub optimal. Do you think you could ask them to bump us to 128GiB next time you submit? It would be nice if they could confirm we don't suffer a significant nps drop. A few percent is expected and normal but probably should be less than 5%. @vondele Do you agree?

vondele commented 4 years ago

Generally I would say more hash is better if we have significant usage of it. In my experience nps is almost independent of the hash size, as soon as the hash is filled (which obviously requires high depth). I hope total memory is sufficient to have 2 engines use 128GB hash + 6-men TB.

mstembera commented 4 years ago

Yes I confirmed the TCEC machine has 1TB total.

mstembera commented 4 years ago

Here are some numbers I got pointed to by TCEC. See pinned comments here for Blue: https://discord.com/channels/479003439125495819/503252511134842885 The nps difference here is insignificant.

Also here: https://discord.com/channels/479003439125495819/656532253471670314/658407338616553492 These show a significant difference after 1 minute but not much after 5 minutes.

The SuFi is 120'+10" which is the same as the match being played right now and hashfull in Livelog seems to hit roughly 70% much of the time. However our time usage spikes a lot from move to move and some moves hit over 90% and some below 50%.

Alayan-stk-2 commented 4 years ago

I watched the Livelog for a bit while SF was playing and hashfull was over 70% quite often. More than 50% is usually sub optimal.

Do we have test data to back this up ? An issue with these extreme configurations is that they see very little testing.

I remember we had some data on low-medium hash sizes (can't find it in UsefulData, but I remember some table that went up to 1GB), but at 64GB and 128GB we are mostly guessing at what the optimal elo-wise hash is.

noobpwnftw commented 4 years ago

To big has isn't always better, first there are slowdowns, second the hash still has 3-fold pollution.

vondele commented 4 years ago

The 'slowdowns' are a dangerous argument. IMO, as soon as the hash is larger than caches, the slowdowns are mostly related to properties of search (e.g. explores more or less new nodes) especially at half-used hashes. When the search is different, the speed is kind of meaningless (e.g. likely that if any visited node is a hash hit, one is quite effective, even if a bit slower). I once did a measurement on speed at various large hash size, at full hash for all sizes (i.e. not standard bench), and it mostly is the same (might depend a bit on hardware, of course):

That's on 127 threads:

Hash (Mb)	nps
1024	155083772
2048	154900915
4096	155914852
8192	159769535
16384	163382859
32768	163083055
65536	161142317
131072	162743338

noobpwnftw commented 4 years ago

It is only not a problem when everything works just right, for example the TT memory does not get migrated by OS very often, the machine doesn't have much NUMA congestion and so on. It gives a false impression that one can go with as large hash as possible without any consequences.

One example is if running on the other box, it'll get severe slowdowns for anything beyond 16GB. RAM configuration is at full speed on all channels, not technically a hardware "problem", it is what it is, just with faster cores and more nodes.

mstembera commented 4 years ago

@Alayan-stk-2 Since 50% and 70% are percentages they are independent of total size. Here is some quick math though. When the hash is 50% full it means any single entry has a 50% chance of being current. There are 3 entries per cluster. Therefore when a new entry has to be stored there is a 0.5 0.5 0.5 = 12.5% chance that another current existing entry will have to be tossed out to make room. At 70% full this becomes 0.7 0.7 0.7 = 34.3% so almost 3x more likely.

@noobpwnftw Yes agree testing for such cases is very important. That is why I asked for the numbers from TCEC linked above. I don't think they point to any problems.

mstembera commented 4 years ago

Another small improvement for submission would be to request slightly less than the max 176 threads available on the TCEC machine. The machine is CPUs: 4 x Intel Xeon 4xE5-4669v4 2.2 GHz, Cores: 88 physical / 176 threads. Leaving 1 or 2 or 4(1 per NUMA node) threads for the OS, other non search SF threads, etc. would allow the scheduler not to prempty the search threads as much and actually help performance. I did a test of 35 threads vs 36 threads on a 36 thread machine and it was a slight win. https://tests.stockfishchess.org/tests/view/5f361d0411a9b1a1dbf18e83 I would request 175 or 174 or 172 threads.

vondele commented 4 years ago

yes, I agree that it might be wise to leave a few threads empty. Maybe 172 is right. The test could however be sensitive to TC (i.e. cutechess reacting too slowly, and a few ms matter at those TC).

mstembera commented 4 years ago

@Alayan-stk-2 @vondele Will one of you be able to notify TCEC of the config tweaks discussed above?

vondele commented 4 years ago

yes I have done so.

mstembera commented 4 years ago

Thanks! I just found out that TCEC will conduct a "hashsize" test after League 1. Here is the info: "Testing Stockfish 64GiB vs 128GiB, 2 games from starting position, 120min+10s. Will link to resulting log files in #enginedev-log after test." The livelog data should be interesting.

Alayan-stk-2 commented 4 years ago

Aloril has confirmed to me that the DivP submission for SF is ok and that SF will use the default net and 172 threads as requested.

mstembera commented 4 years ago

The two 120'+10" TCEC test games of SF NNUE 172 thread 64GiB vs 128GiB have concluded and are accessible here: https://tcec-chess.com/#div=hashsize&game=1&season=19 The compressed log file is also available for download here: https://tcec-chess.com/loglive/archive/TCEC_Season_19_-_Hash_Size_Test.log.xz I don't believe there is any speed issue with 128GiB and the log also shows that we do run into situations where the hash is over 90% full. I would like to thank TCEC for running these tests. We should notify them before the start of DivP of our decision. @vondele @Alayan-stk-2 Thank you.

vondele commented 4 years ago

As extracted from the info string just before the bestmove:

depth nps hashfull

nothing particular stands out, the observation that with 64GiB we're indeed quite frequently with hashfull > 90% holds, so I think we can go with 128.

skiminki commented 4 years ago

A quick note: There's basically three theoretical slowdowns related to hash size:

When the hash is bigger than the CPU L3 cache, which is around couple megs to couple dozen megs depending on the system. After that, TT lookups start to become misses.
When the hash is bigger than the CPU TLB cache coverage, which is around couple of gigs with large pages. After that, TT lookups also begin to require a page walk for virtual-to-physical address translation.
When the hash is so big that even the page tables won't fit in the CPU caches. Then the TT lookups will also trigger an extra DRAM read for accessing the last level of page tables. That should be somewhere in the tens of TBs range. 1 GB large pages should push this transition beyond the PB range.

64 GB and and 128 GB are well between slowdowns 2 & 3, so there shouldn't be a big nps difference with these hash sizes. On paper, at least.

The slowdown transitions are easiest to observe with 1-threaded bench using a high-clocked CPU.

noobpwnftw commented 4 years ago

The TCEC box has a quite rare configuration which seems to be free of most problems regarding to RAM sizes. Such behavior usually does not transfer to any other machine, people need to run benchmarks on their own box to decide what is the best.

mstembera commented 3 years ago

I noticed CCCC has upgraded their hardware to 2x AMD EPYC 7H12. Can someone notify them that the AVX2 build should now be faster than the BMI2(listed as Haswell on abrok) build?

vondele commented 3 years ago

I have exchanged some messages with their sysadmin. I think they will be using avx2 profile-build now. They still need to do some work on the hardware (fully populate memory channels, to get full bandwidth, this seems the reason why nps is lower than expected).

mstembera commented 3 years ago

@vondele It just occurred to me that if main memory access seems to be an issue the change in this PR https://github.com/official-stockfish/Stockfish/pull/1663 should probably be tested. We never verified anything above 63 threads. It's strange that Komodo has no slowdown like we do. The updated code for the current version of TT would be

if ((tte[i].genBound8 & 0xF8) != generation8 && tte[i].depth8)

vondele commented 3 years ago

@mstembera we would need to be able to reproduce the slowdown in the first place.

mstembera commented 3 years ago

@vondele Of course. I don't know if CCCC would be interested in trying a custom binary to see if it solves the issue. I am willing to build the binary and make it available. Will you be communicating with them further or want to put me in touch with them?

vondele commented 3 years ago

I was in contact via discord, his handle is Jesse#8452 Do you know if engines like Ethereal have the same issue, at least there one can have a look at the code.

mstembera commented 3 years ago

Here is a small sample of data from TCEC and CCCC games versus K(Dragon) and E. The nps values are just an eyeball estimate from the graph of the first few moves. TCEC 172/176 Threads https://tcec-chess.com/#div=p&game=31&season=20 SF ~105Mnps E ~125Mnps 0.84 ratio https://tcec-chess.com/#div=p&game=6&season=20 SF ~105Mnps K ~95Mnps 1.10 ratio

CCCC 232 Threads https://www.chess.com/computer-chess-championship#event=budapest-bullet&game=1 SF ~85Mnps E ~110Mnps 0.77 ratio https://www.chess.com/computer-chess-championship#event=budapest-bullet&game=9 SF ~75Mnps K ~135Mnps 0.55 ratio

If I get a chance to communicate w/ CCCC I will offer to provide a test binary or feel free to volunteer me in case you communicate with them first.

mstembera commented 3 years ago

I have attached a zip file with the modified TT binary as well as a master binary for comparison(compiled using the same version of gcc etc.) in case CCCC wants to grab it from here for testing. SF_AVX2_bin_test.zip

vondele commented 3 years ago

so, both E and SF run slower on CCCC, while K has a huge boost. On the startpos on very similar hardware I have 160M. I've mentioned this issue to the CCCC sysadmin.

official-stockfish / Stockfish

Stockfish submissions to TCEC #2082