thomas-maeder / popeye

Popeye is a chess problem solving and testing software with strong support for fairy chess and heterodox genres. For more information cf. topic "Popeye (chess)" on http://en.wikipedia.org/
31 stars 14 forks source link

Initialization stack overflow in Emscripten (wasm) build #356

Open MuTsunTsai opened 1 year ago

MuTsunTsai commented 1 year ago

I'm experimenting with compiling Popeye to WebAssembly and perhaps run it in browsers. Since this repo already includes the emcc toolchain, I suppose it should just work, but after successfully compile to .wasm file, executing the code with any problem aways shows an empty board. For example, with this input (from the examples):

beg
stip h#3
opt nowk
auth Stephen Emmerson
orig Unpublished
tit N.B. minor dual in (b)
pie blac ke8
pie blac hur gd8 ma8 eaf8 swg8
twin move a8 h8
end

The output is:

Popeye asm.js-32Bit v4.87 (16 MB)

          Stephen Emmerson
             Unpublished
       N.B. minor dual in (b)

+---a---b---c---d---e---f---g---h---+
|                                   |
8   .   .   .   .   .   .   .   .   8
|                                   |
7   .   .   .   .   .   .   .   .   7
|                                   |
6   .   .   .   .   .   .   .   .   6
|                                   |
5   .   .   .   .   .   .   .   .   5
|                                   |
4   .   .   .   .   .   .   .   .   4
|                                   |
3   .   .   .   .   .   .   .   .   3
|                                   |
2   .   .   .   .   .   .   .   .   2
|                                   |
1   .   .   .   .   .   .   .   .   1
|                                   |
+---a---b---c---d---e---f---g---h---+
h#3                         0 + 0

a) 

b) 

solution finished. Time = 469329:23:05 h:m:s

And in the console there're 2 instances of error both sides need a king. Also notice that the Time above doesn't make sense as it really ends instantly, although I'm not sure if this is significant.

Since the .wasm file runs this far, I suppose I didn't compile it the wrong way (nor loading the input the wrong way). But just in case, this is what I did:

  1. In Ubuntu, install Emscripten and clone the Popeye repo.
  2. Modify the last line of makefile.defaults to TOOLCHAIN=emcc.
  3. Run make -f makefile.unx in the popeye folder. (I've also tried emmake make -f makefile.unx, with the same result.)
MuTsunTsai commented 1 year ago

OK I'm still not sure why, but downgrading to Emscripten 1.40.1 will then create a WASM build that runs perfectly.

thomas-maeder commented 1 year ago

@MuTsunTsai can we close this issue?

MuTsunTsai commented 1 year ago

@thomas-maeder Well... yeah, I guess it's fine as long as workaround exists. If anyone has the same issue they can at least find a solution here.

MuTsunTsai commented 4 months ago

After familiar myself with the project building and Emscripten options, I finally figured this out. The crux of the problem is that, since version 3.1.27, Emscripten have reduced the default stack size from 5MB to just 64KB, and this turns out to be way less than what is needed by Popeye, so somewhere in the initialization it runs into stack overflow and thus it doesn't process the input further, leading to the error we saw here. By adding -sSTACK_SIZE=8MB to the compile command, Popeye now compiles and run successfully with the latest version of Emscripten.

@JoshuaGreen I later noticed that in toolchains/cross-x86_64-pc-linux/make.incl it even mentioned about making the stack to 16MB! I think you have dealt with this before in #280. Could you tell if 16MB is sufficient for all use cases?

MuTsunTsai commented 3 months ago

From my understanding, requiring such a large stack memory seems rather unusual. This suggests that Popeye heavily relies on recursion instead of iteration. On the long run, we might want to consider modifying those parts to reduce the need of analyzing the stack allocation.

JoshuaGreen commented 3 months ago

Alas, I don't have any great insights here.

It's clear that Popeye can generate some rather deep call stacks. Unfortunately, I can't imagine there's a simple way to bound the depth. If we're concerned here then I suppose we could find functions using relatively large amounts of stack memory and shift their data to the heap, but that would only go so far. Reducing the depth would probably require significant changes to the architecture.

MuTsunTsai commented 3 months ago

@JoshuaGreen Hmm, that sounds unsettling. From what I've heard, this might not be an issue on some platforms (such as Linux) where the stack size may grow to some extend on runtime, but it certainly is a problem with WASM, where the stack size is fixed since compile time. I guess I'll have to leave this issue open until we can do something to control the stack.

Meanwhile, I shall stick with an 8MB stack in my FEN-Tool porting for now, which should be good, I think, since the default stack size was only 5MB and I didn't receive any issue report with that.

JoshuaGreen commented 3 months ago

One random thought.  The debugging infrastructure can already (mostly) track the depth.  I suppose we could leverage it to, say, exit the program if the depth ever gets too large.

thomas-maeder commented 3 months ago

From my understanding, requiring such a large stack memory seems rather unusual. This suggests that Popeye heavily relies on recursion instead of iteration. On the long run, we might want to consider modifying those parts to reduce the need of analyzing the stack allocation.

Yes, Popeye very heavily relies on recursion. I don't see a way to change that. The stack size required is correlated to the number of moves of the stipulation; i.e. a mate in 30 requires roughly twice the stack as a mate in 15.

Am I right to assume that WebAssembly executables mostly (exclusively?) run on hardware that is restricted wrt all kinds of resources, including CPU?

In that case, a limit on stack size is just one of several reasons why one doesn't want to test a complex, say, mate in 30 using a WebAssembly executable.

MuTsunTsai commented 3 months ago

I don't see a way to change that.

Typically it boils down to implementing the stack ourselves in iterations. It is definitely doable, and the only question is if it's worth the trouble.

Am I right to assume that WebAssembly executables mostly (exclusively?) run on hardware that is restricted wrt all kinds of resources, including CPU?

Not exactly. If anything, the only resource limitation imposed on WebAssembly is the memory, where browsers may refuse to allocate a memory larger than 2GB (or 1GB, depending on the device) for it, but that's not quite bad for most use cases. The CPU is not restricted at all, and in principle WebAssembly executables can run nearly as fast as native counterparts, although it may require some fine-tuning, which I haven't been able to achieve with Popeye yet. So for now, the Popeye WASM is still noticably slower than native builds.

JoshuaGreen commented 3 months ago

Typically it boils down to implementing the stack ourselves in iterations. It is definitely doable, and the only question is if it's worth the trouble.

That's true in theory.  In practice, how difficult that would be depends on the nature of the recursion.

I believe an easy case is appendDirTable_recursive in DHT/dht.c.  Indeed, I already have a non-recursive version of that function (that I still need to test).  However, writing that was mainly for fun as I doubt switching to it would provide much real value.

In the overall infrastructure my understanding is that we have bunches of "slices" that connect via infrastructure like stip_traverse_structure_children_pipe in stipulation/structure_traversal.c.  That function kicks off processing at some level, and the function at the bottom of that level calls that function to move to the next slice.  To eliminate the recursion we'd have to somehow return to an earlier point on the stack (via returns, or perhaps longjump) while saving enough to state to be able to return to the earlier computation during "stack unwinding."  In principle we could rearchitect each function to push its state onto some growing heap buffer, but it would take a lot of work to implement that across the board, and there'd be some performance hit from doing so.