lantonov / asmFish

A continuation of the nice project asmFish by Mohammed Li. Latest version: 07.08.2019
https://lantonov.github.io/asmFish/
Other
118 stars 49 forks source link

Testers needed for LP-related crashes #160

Closed CounterPly closed 6 years ago

CounterPly commented 6 years ago

I am attempting to fix the Large Pages crash that some users have recently reported in asmFishW (primarily the bmi2 versions in certain ChessBase GUIs). However, since asmFish+LP works properly on my system, it is extremely difficult for me to determine where this bug originated.

For anyone capable of reproducing this crash and interested in helping us correct it --

Below is a link to a file containing all asmFish (bmi2) builds ranging from August 2017 - April 2018. If someone could kindly tell me what at what version the crashes start, this would be immensely helpful.

Bugfix-inProgress.zip

cirebonb commented 6 years ago

only in bmi2 version? how about popcount version? my laptop doesnt support bmi2,

CounterPly commented 6 years ago

Most reports of issues (on Github and other forums) have mostly been related to the recent bmi2 builds. This could be coincidental, however. Feel free to try and induce crashes with popcnt versions too, as this could also be helpful.

CounterPly commented 6 years ago

Update:

I've narrowed it down to 5 candidate patches that could be causing the issues some users have reported recently. Could someone experiencing crashes please tell me which of these asmFish versions crash for them with their Large Pages enabled?

5patches.zip 5patches_popcnt.zip

greentip1 commented 6 years ago

Error in GUI Scid vs. PC on Windows. My CPU does not support BMI2. All executables does not work with LP. POPCNT logs: popcnt_logs.zip

Same problem in base executable file. asmFishW_9_base log: asmFishW_9_base_log.zip

Sportsmen commented 6 years ago

Hello I noticed I had a Chessbase crash with both the popcnt and bmi2 compiles. The last working one is 3/21/18(granted it might be off by a day) The first time I saw the crash was with a compile from 3/30 near the end of the month.

CounterPly commented 6 years ago

Thank you to all users from Github, MZ Chess Forum, and Immortal Chess Forum who have been kind enough to help with this. I can't emphasize enough how helpful this feedback is (especially since none of the asmFish devs can reproduce this strange crash). It is still unknown at this time why LP works for some users but fails for others.

Based on everyone who has responded so far, it appears the root of this crash goes back even further than I initially thought. I am currently examining every patch in the month of March since this appears to be the month that the LP-bug first crept into the source.

The following are the top 7 patch suspects for the month of March. Please let me know which of these (if any) crash when Large Pages is enabled. (Note: It is particularly important to test patch 7 since I suspect that it will work properly).

7patches.zip

Sportsmen commented 6 years ago

Both versions for patch 7,6,5,4, and 3 work in chessbase, with LP. Both versions for patch 1 and 2 do not work in chessbase.

CounterPly commented 6 years ago

@tthsqe12 @lantonov

The crash users have been experiencing recently is due to this commit: ffc344d3ed63000f4306ca309995410ef1444cf7

This was very difficult to trackdown since I can't reproduce this crash on my hardware, but there have enough reports to corroborate that this is where the Large-Pages crash was first introduced.

I have been unable to both fix the crash and maintain the current bench. (A revert prevents the crashes from occuring, but unfortunately the bench changes). Any advice?

CounterPly commented 6 years ago

For anyone that can reproduce this crash, it would be very helpful for Mohammed, Lyudmil, and myself if you could include both the exception code and exception offset of the crash. Perhaps @Sportsmen could help with this.

CounterPly commented 6 years ago

Temporary LP-fix for the latest asmFish version:

https://github.com/Counterply/asmFish

The strength of this version is approximately equivalent to that of the current master. The benches are different, however, so this revert is not a long-term solution.

image PGN Link

greentip1 commented 6 years ago

Patches 3-7 POPCNT work fine. Patches 1 and 2 POPCNT do not work. GUI Scid vs. PC.

CounterPly commented 6 years ago

@greentip1

Thanks for the feedback. Your results are consistent with that of @Sportsmen and others. For the crashes, could you please provide:

Example: https://stackoverflow.com/questions/7143895/how-do-i-trace-an-intermittent-crash-that-occurs-only-under-the-debugger-but-is

Sportsmen commented 6 years ago

Hey Justin happy to help out any way I can. I read the link from the other site but am still not sure how to get you the info you would like. Here are the screenshots from my crash. I do not get a normal error window where I can click more info.

error 1 error 2

tthsqe12 commented 6 years ago

Ok I will look into this

tthsqe12 commented 6 years ago

If you cant get an exception code or offset, a simple log from the gui should be enough to track this down. What is the quickest and simplest way to reproduce the crash?

Sportsmen commented 6 years ago

Open chessbase. create uci engine and click large pages. Add kibitzer, and select the engine and I get the first error message.

tthsqe12 commented 6 years ago

Haha i can reproduce the crash. It seems to happen at least when large pages are requested but not available. Should be easy to fix.

tthsqe12 commented 6 years ago

@Counterply you can commit a change to Hash.asm right away to fix this if you want. See code change and comment 1524693674076984937130

CounterPly commented 6 years ago

Thank you for the quick fix, Moha. I pushed your changes along with the latest executables.

I'll give users a little time to report any other LP-related issues, and then I will update the asmFish 9 release to reflect these changes.

tthsqe12 commented 6 years ago

Thank goodness for HLL's and their compilers.

Sportsmen commented 6 years ago

Both versions with LP seem to be working! Thanks guys!

lantonov commented 6 years ago

Thank you, guys!!! To @Counterply for his efforts and dedication and @tthsqe12 who was quick to find and fix the error as usual. I tried to evoke a crash with the executables given by @Counterply but was not successful. Sorry to be too late to look into this issue. In event of a crash and if the user is on Windows, the offset can be easily found in Windows Administrative Tools -> Event Viewer -> Windows Logs -> Application I was able to cause a crash image and see the offset in the Event Viewer image

lantonov commented 6 years ago

image

It's OK now !

lantonov commented 6 years ago

Just for a heads up, here are the statements that James Peters debunks in his book The Art of Assembly Language Programming: • Assembly is hard to learn. • Assembly is hard to read and understand. • Assembly is hard to debug. • Assembly is hard to maintain. • Assembly is hard to write. • Assembly language programming is time consuming. • Improved compiler technology has eliminated the need for assembly language. • Today, machines are so fast that we no longer need to use assembly. • If you need more speed, you should use a better algorithm rather than switch to assembly language. • Machines have so much memory today, saving space using assembly is not important. • Assembly language is not portable.

CounterPly commented 6 years ago

Thanks for the awesome resource, Lyudmil. I look forward to reading Peters' book.

As a side note, I have often thought it would be nice to make YouTube videos of the thought-process involved with porting a SF patch to assembly. That way more people would be able to contribute to the asmFish project (at least to some of the easier patches). Such a video series would at least require a basic understanding of assembly, though. Perhaps something like Matt Godbolt's x86 Crash Course at CppCon 2017 would work.

Of course, having more help could easily be a double-edged sword. As Moha previously mentioned, non-uniform register conventions could quickly become a nightmare, so we'd have to be somewhat strict with code maintenance.

lantonov commented 6 years ago

The book is for complete noobs like me with extensive explanations about everything connected with assembly.

tthsqe12 commented 6 years ago

@lantonov Could you tell me what you understand of the nnet used in lczero? As far as I know, what goes in is the last 8 board positions, and what comes out is a win probability estimate and a weight for each move. Is this correct? About how many parameters are used in the net, and about how many floating point operations are involved? Are the floating point operations mostly matrix dot vector? The source seems to be a plug-n-play with tensorflow, so not much is exposed.

Ipmanchess commented 6 years ago

@tthsqe12 you can easy ask anything on LCZero Chat..they are very helpfull!

lantonov commented 6 years ago

@tthsqe12 I haven't looked into the details of lczero/AlphaZero so I can't immediately answer these questions. I promise to look in the Google papers and in the TensorFlow code soon and try to find the answers. After all, it was me who opened this NN question about a year ago and should deliver on this.

lantonov commented 6 years ago

The answer to the first question about input and output is correct from what I read. In the arxiv paper it is: "The input to the neural network is an N x N x (MT + L) image stack that represents state using a concatenation of T sets of M planes of size N x N. Each set of planes represents the board position at a time-step t - T + 1,..., t, and is set to zero for time-steps less than 1. The board is oriented to the perspective of the current player. The M feature planes are composed of binary feature planes indicating the presence of the player’s pieces, with one plane for each piece type, and a second set of planes indicating the presence of the opponent’s pieces. ... There are an additional L constant-valued input planes denoting the player’s colour, the total move count, and the state of special rules: the legality of castling in chess (kingside or queenside); the repetition count for that position (3 repetitions is an automatic draw in chess; 4 in shogi); and the number of moves without progress in chess (50 moves without progress is an automatic draw). ... The first set of features are repeated for each position in a T = 8-step history."

By "planes" I understand matrices of zeroes and ones. For example, for the piece planes we have 1 in the place of the piece and 0 elsewhere. For chess, I think that N=8 so planes are 8 x 8.

tthsqe12 commented 6 years ago

So planes that correspond to states before the start of the game are simply zeroed-out? That seems very strange. So what is L in the case of chess? It seems like a waste to feed in constant planes. But maybe that goes well with the nnet? So there are at least 8*8*(12*8+4) = 64000 inputs to the nnet - wow.

CounterPly commented 6 years ago

@tthsqe12

Re: L in the case of chess

Here is a table from the paper Lyudmil referenced:

image

lantonov commented 6 years ago

Seems too much zeroes; these should be some very sparse matrices. As for the number of NN parameters like weights and biases, I haven't counted them yet (should make a regex for it) but the text file containing them is > 80 MB. They are floating point numbers. If we give average 10 characters per parameter, these make about 8 million parameters.

lantonov commented 6 years ago

The above table is somewhat unclear to me. Especially I cannot understand where the total of 119 planes comes from.

tthsqe12 commented 6 years ago

@lantonov 119 = 8*(6 + 6 + 2) + 1 + 1 + 2 + 2 + 1.

This is funny. So they have general framework that accepts a cuboid of numbers, and this how they get the chess game to fit in this cuboid? The next question how the nnet scrambles these numbers...

lantonov commented 6 years ago

Ah, ok, thanks

lantonov commented 6 years ago

We have 2 counts (total-move count and no-progress count) which are represented by single real value, so we have 117 planes and the total should be 64 x 117 + 2 = 7490 inputs. Still many.

tthsqe12 commented 6 years ago

The description you gave before indicates that they are keen on producing a regular cuboid of inputs, so I am tempted to conclude that there are 8*8*119=7616 inputs. Those constants are simply spread across a whole other input plane. I am simply shocked at the scale of the network. If the 8M parameters figure is accurate that would put the minimum size of the program at 32MB, which is bordering on grotesque (not to mention all of the dependencies on other libraries). SF is 230KB on linux and asmfish is 120KB. Much better chess is currently packed into 200-300 times less space. Nevertheless, I find the project interesting.

tthsqe12 commented 6 years ago

My favorite board game is hex/y for simplicity. Is there a self-learner project for hex?

tthsqe12 commented 6 years ago

As for needing a better algo instead of asm, I heard that openblas is so fast at naive matrix multiplication that they cant find a crossover point to switch to Strassen's algo. 😉

lantonov commented 6 years ago

I still think that a lot of input neurons can be saved by encoding the board in a more rational way and getting rid of this sparcity. For example: White Pawn = 1 Knight = 3 Bishop = 3.1 Rook = 4.5 Queen = 9 King = 10 For Black the same but with negative signs. For 8 boards we'll have 64 x 8 = 512 input neurons plus 7 x 8 = 56 L's (as scalars) = 568 neurons. Sometimes it is needed that inputs are between 0 and 1. Then White Pawn = 0.1 Knight = 0.3 Bishop = 0.31 Rook = 0.45 Queen = 0.9 King = 1.0 and for Black negatives. I see several advantages of this input

  1. More informative with less input neurons
  2. Rough material balance of the board is by simply summing the values on it
  3. If we look at the board as B & W image, these values approximate the pixel intensities
lantonov commented 6 years ago

I had a link to TensorBoard with the detailed architecture of lc0 but I temporarily lost it. I am looking for it. In this link there is an attempt to explain the architecture and parameters of the network though I don't know how accurate that is. IMO, by layering the planes one over the other we have something like a parallelopiped with a section 8x8 and a depth 119 (the number of planes). This would be the input layer. Further we have 20 more layers each with depth 128 and a cross-section of 8x8 (?). We have also some convolution filters of size 3x3, however, details about this are fuzzy to me and I will try to make those clearer.