Closed xoto10 closed 5 years ago
The main question right now is, which version should TCEC use in Div P starting tomorrow?
I'm absolutely ok to assign someone to take care of tournament submission.
I would avoid to indicate a name myself. If Stephane agrees, people could submit their availability to take the role here in this thread, and a decision will be taken according to the community feedback (at the moment I prefer not indicate strict election rules, if needed it will be better formalized).
Regarding tomorrow's tcec, there's no reason now to avoid current Master.
For the current TCEC, they have received an email agreeing a version, so the urgent part is done now :-)
Having someone assigned to handle version submission to TCEC would be of limited use if that person has no authority to answer anything but "use the latest abrok exec from before the deadline".
Decisions should be mindful of community input, but e.g. a person assigned to submissions should be able to ask for non-default contempt or other meaningful parameters changes we'd have good reasons to believe would help.
@Alayan-stk-2 I agree with you on this point. This role should have some flexibility in picking the binary, as long as the engine is still the official Stockfish and not something else. It should be an engine, not necessary from the current master, but IMO picked from this official SF repository. The flexibility I see is regarding the commit from which to pick, the compiled binary and the choice of UCI parameters values.
@mcostalba @Alayan-stk-2 The next TCEC deadline is for submission for the TCEC Cup, I believe the deadline is the end of the current premier division, which is likely to finish in around 7 days, probably sometime on 2019-04-27.
We need to submit a version today.
Aloril42: xoto10 alayant about 18h left until DivP is over.
I have merged https://github.com/official-stockfish/Stockfish/pull/2108 for TCEC.
Nobody stepped up, so I think I can close this one.
I have submitted this one to TCEC:
@mcostalba I didn't step up as I viewed this as giving some duty to be available and handle these matters, and I can't always be.
However, it seems that each time SF is to play in a new bonus/div at TCEC, the TCEC people have some trouble getting info on updates, and this would be less of an issue if I could confirm updates.
Aloril42: @alayant Could you be/become person who sends tcecSF updates for TCEC? (Would be basically making sure latest is OK and sending email about it.. and parameters if they are changed) alternatively @xoto10 could be too though not seen here, seen in discord though.
Right now, I'm not appointed to do so, so even when I'm available and am sure of what we should send, my word isn't quite enough.
Would anybody mind ?
@Alayan-stk-2 please forgive me but I am not able to parse: "Right now, I'm not appointed to do so, so even when I'm available...".
This PR was open to ask if someone would step up. Nobody showed. I close it. If now something has changed I am glad to re-open.
So I ask again:
We are looking for somone to take care tournament submission. This role should have some flexibility in picking the binary, as long as the engine is still the official Stockfish and not something else. It should be an engine, not necessary from the current master, but picked from the official SF repository. The flexibility is regarding the commit from which to pick, the compiled binary and the choice of UCI parameters values.
If someone is interested, please, simply do write: "Yes, I candidate myself for this role".
Yes, I candidate myself for this role :slightly_smiling_face:
ok, i will leave this open few more days to see if someone else shows up.
I'm okay with alayant being a submitter
It has been a week now, and nobody else has stepped up or voiced opposition. So we can proceed further I presume ?
@Alayan-stk-2 sorry for my late reply!
Yes, of course. I'm ok with @Alayan-stk-2 as submitter for TCEC.
@mcostalba Thanks.
I assume I could also handle submissions to CCC ? This issue mentions mostly TCEC so coolchess would like a confirmation. I assume that if I'm trusted for one, I can be for the other. :slightly_smiling_face:
@Alayan-stk-2 yes, this is good for CCC too
@Alayan-stk-2 I have received the following email from Anton Mihailov (TCEC):
Premier Division of TCEC Season 16, where your engine participates, is going to start soon. Last move of bonus gauntlet after League 1 Playoff is deadline for updates. Please, send us your latest stable version and any additional information that you would like the admin to know.
Keep Aloril and Kan on CC. They are top professionals and will keep the smooth running of TCEC as usual.
For information about the season check it out at this link http://www.chessdom.com/tcec-season-16-information-and-participants/ . All the best in S16!
@snicolet Don't worry, I'm keeping an eye on it.
@Alayan-stk-2 I recently found out that TCEC now allows specifying up to 128GiB of hash and other engines are making use of it. SF is still set to 64GiB. I watched the Livelog for a bit while SF was playing and hashfull was over 70% quite often. More than 50% is usually sub optimal. Do you think you could ask them to bump us to 128GiB next time you submit? It would be nice if they could confirm we don't suffer a significant nps drop. A few percent is expected and normal but probably should be less than 5%. @vondele Do you agree?
Generally I would say more hash is better if we have significant usage of it. In my experience nps is almost independent of the hash size, as soon as the hash is filled (which obviously requires high depth). I hope total memory is sufficient to have 2 engines use 128GB hash + 6-men TB.
Yes I confirmed the TCEC machine has 1TB total.
Here are some numbers I got pointed to by TCEC. See pinned comments here for Blue: https://discord.com/channels/479003439125495819/503252511134842885 The nps difference here is insignificant.
Also here: https://discord.com/channels/479003439125495819/656532253471670314/658407338616553492 These show a significant difference after 1 minute but not much after 5 minutes.
The SuFi is 120'+10" which is the same as the match being played right now and hashfull in Livelog seems to hit roughly 70% much of the time. However our time usage spikes a lot from move to move and some moves hit over 90% and some below 50%.
I watched the Livelog for a bit while SF was playing and hashfull was over 70% quite often. More than 50% is usually sub optimal.
Do we have test data to back this up ? An issue with these extreme configurations is that they see very little testing.
I remember we had some data on low-medium hash sizes (can't find it in UsefulData, but I remember some table that went up to 1GB), but at 64GB and 128GB we are mostly guessing at what the optimal elo-wise hash is.
To big has isn't always better, first there are slowdowns, second the hash still has 3-fold pollution.
The 'slowdowns' are a dangerous argument. IMO, as soon as the hash is larger than caches, the slowdowns are mostly related to properties of search (e.g. explores more or less new nodes) especially at half-used hashes. When the search is different, the speed is kind of meaningless (e.g. likely that if any visited node is a hash hit, one is quite effective, even if a bit slower). I once did a measurement on speed at various large hash size, at full hash for all sizes (i.e. not standard bench), and it mostly is the same (might depend a bit on hardware, of course):
That's on 127 threads:
Hash (Mb) | nps |
---|---|
1024 | 155083772 |
2048 | 154900915 |
4096 | 155914852 |
8192 | 159769535 |
16384 | 163382859 |
32768 | 163083055 |
65536 | 161142317 |
131072 | 162743338 |
It is only not a problem when everything works just right, for example the TT memory does not get migrated by OS very often, the machine doesn't have much NUMA congestion and so on. It gives a false impression that one can go with as large hash as possible without any consequences.
One example is if running on the other box, it'll get severe slowdowns for anything beyond 16GB. RAM configuration is at full speed on all channels, not technically a hardware "problem", it is what it is, just with faster cores and more nodes.
@Alayan-stk-2 Since 50% and 70% are percentages they are independent of total size. Here is some quick math though. When the hash is 50% full it means any single entry has a 50% chance of being current. There are 3 entries per cluster. Therefore when a new entry has to be stored there is a 0.5 0.5 0.5 = 12.5% chance that another current existing entry will have to be tossed out to make room. At 70% full this becomes 0.7 0.7 0.7 = 34.3% so almost 3x more likely.
@noobpwnftw Yes agree testing for such cases is very important. That is why I asked for the numbers from TCEC linked above. I don't think they point to any problems.
Another small improvement for submission would be to request slightly less than the max 176 threads available on the TCEC machine. The machine is CPUs: 4 x Intel Xeon 4xE5-4669v4 2.2 GHz, Cores: 88 physical / 176 threads. Leaving 1 or 2 or 4(1 per NUMA node) threads for the OS, other non search SF threads, etc. would allow the scheduler not to prempty the search threads as much and actually help performance. I did a test of 35 threads vs 36 threads on a 36 thread machine and it was a slight win. https://tests.stockfishchess.org/tests/view/5f361d0411a9b1a1dbf18e83 I would request 175 or 174 or 172 threads.
yes, I agree that it might be wise to leave a few threads empty. Maybe 172 is right. The test could however be sensitive to TC (i.e. cutechess reacting too slowly, and a few ms matter at those TC).
@Alayan-stk-2 @vondele Will one of you be able to notify TCEC of the config tweaks discussed above?
yes I have done so.
Thanks! I just found out that TCEC will conduct a "hashsize" test after League 1. Here is the info: "Testing Stockfish 64GiB vs 128GiB, 2 games from starting position, 120min+10s. Will link to resulting log files in #enginedev-log after test." The livelog data should be interesting.
Aloril has confirmed to me that the DivP submission for SF is ok and that SF will use the default net and 172 threads as requested.
The two 120'+10" TCEC test games of SF NNUE 172 thread 64GiB vs 128GiB have concluded and are accessible here: https://tcec-chess.com/#div=hashsize&game=1&season=19 The compressed log file is also available for download here: https://tcec-chess.com/loglive/archive/TCEC_Season_19_-_Hash_Size_Test.log.xz I don't believe there is any speed issue with 128GiB and the log also shows that we do run into situations where the hash is over 90% full. I would like to thank TCEC for running these tests. We should notify them before the start of DivP of our decision. @vondele @Alayan-stk-2 Thank you.
As extracted from the info string just before the bestmove:
nothing particular stands out, the observation that with 64GiB we're indeed quite frequently with hashfull > 90% holds, so I think we can go with 128.
A quick note: There's basically three theoretical slowdowns related to hash size:
64 GB and and 128 GB are well between slowdowns 2 & 3, so there shouldn't be a big nps difference with these hash sizes. On paper, at least.
The slowdown transitions are easiest to observe with 1-threaded bench using a high-clocked CPU.
The TCEC box has a quite rare configuration which seems to be free of most problems regarding to RAM sizes. Such behavior usually does not transfer to any other machine, people need to run benchmarks on their own box to decide what is the best.
I noticed CCCC has upgraded their hardware to 2x AMD EPYC 7H12. Can someone notify them that the AVX2 build should now be faster than the BMI2(listed as Haswell on abrok) build?
I have exchanged some messages with their sysadmin. I think they will be using avx2 profile-build now. They still need to do some work on the hardware (fully populate memory channels, to get full bandwidth, this seems the reason why nps is lower than expected).
@vondele It just occurred to me that if main memory access seems to be an issue the change in this PR https://github.com/official-stockfish/Stockfish/pull/1663 should probably be tested. We never verified anything above 63 threads. It's strange that Komodo has no slowdown like we do. The updated code for the current version of TT would be
if ((tte[i].genBound8 & 0xF8) != generation8 && tte[i].depth8)
@mstembera we would need to be able to reproduce the slowdown in the first place.
@vondele Of course. I don't know if CCCC would be interested in trying a custom binary to see if it solves the issue. I am willing to build the binary and make it available. Will you be communicating with them further or want to put me in touch with them?
I was in contact via discord, his handle is Jesse#8452 Do you know if engines like Ethereal have the same issue, at least there one can have a look at the code.
Here is a small sample of data from TCEC and CCCC games versus K(Dragon) and E. The nps values are just an eyeball estimate from the graph of the first few moves. TCEC 172/176 Threads https://tcec-chess.com/#div=p&game=31&season=20 SF ~105Mnps E ~125Mnps 0.84 ratio https://tcec-chess.com/#div=p&game=6&season=20 SF ~105Mnps K ~95Mnps 1.10 ratio
CCCC 232 Threads https://www.chess.com/computer-chess-championship#event=budapest-bullet&game=1 SF ~85Mnps E ~110Mnps 0.77 ratio https://www.chess.com/computer-chess-championship#event=budapest-bullet&game=9 SF ~75Mnps K ~135Mnps 0.55 ratio
If I get a chance to communicate w/ CCCC I will offer to provide a test binary or feel free to volunteer me in case you communicate with them first.
I have attached a zip file with the modified TT binary as well as a master binary for comparison(compiled using the same version of gcc etc.) in case CCCC wants to grab it from here for testing. SF_AVX2_bin_test.zip
so, both E and SF run slower on CCCC, while K has a huge boost. On the startpos on very similar hardware I have 160M. I've mentioned this issue to the CCCC sysadmin.
@mcostalba @snicolet No sf version has been submitted to this season's TCEC Division Premier which is about to start. Apparently they emailed us (I'm not sure who) on Wednesday and haven't received a reply.
I have suggested they use a recent version from abrok, I suggested this one:
as the last one before their deadline.
Can you contact TCEC to conifrm this is the one to use please?
Can we name someone to take charge of TCEC submissions (@Alayan-stk-2 :-) ?), preferably with someone else as backup. Or officially ask them to just use the latest abrok version each time, so that we're not relying on unreliable communications?