pychess / pychess

PyChess - a chess client for Linux/Windows
https://pychess.github.io/
GNU General Public License v3.0
689 stars 183 forks source link

PyChess doesn't start #208

Closed pychess closed 9 years ago

pychess commented 9 years ago

Original issue 208 reported by gbtami 2007-09-15

If i delete engines.xml and ~/.pychess to simulate a new install, and have installed gnuchess and crafty, i get this error, and PyChess can't start. (If i have no gnuchess and crafty, start is OK.)

tamas@tami:~/svn/pychess$ PYTHONPATH=lib ./pychess 2007-09-15 19:11:53 Default Error: Thread <Worker(Thread-5, started daemon)> in threadpool raised following error: Traceback (most recent call last): File "/home/tamas/svn/pychess/lib/pychess/System/ThreadPool.py", line 51, in run self.func() File "/home/tamas/svn/pychess/lib/pychess/System/ThreadPool.py", line 33, in a.func = lambda: func(*args) File "/home/tamas/svn/pychess/lib/pychess/Players/engineNest.py", line 185, in _findOutMore e = self.initEngine (engine, WHITE) File "/home/tamas/svn/pychess/lib/pychess/Players/engineNest.py", line 384, in initEngine return ProtocolEngine( protocol(subprocess, color) ) File "/home/tamas/svn/pychess/lib/pychess/Players/CECPProtocol.py", line 59, in init self.board = Board(setup=True) File "/home/tamas/svn/pychess/lib/pychess/Utils/Board.py", line 24, in init self._applyFen (FEN_START) File "/home/tamas/svn/pychess/lib/pychess/Utils/Board.py", line 34, in _applyFen for cord in iterBits(wpieces[PAWN]): File "/home/tamas/svn/pychess/lib/pychess/Utils/lutils/bitboard.py", line 35, in iterBits return bitsArray0[bitboard >> 48] + \ TypeError: unsubscriptable object

WARNING: Giving 10 minutes for loading engine PyChess 0.8beta1

pychess commented 9 years ago

Original comment 1. posted by lobais 2007-09-17

Strange, I'm unable to reproduce this. Can you try again with 603?

pychess commented 9 years ago

Original comment 2. posted by leogregianin 2007-09-17

With me is happening the same

pychess commented 9 years ago

Original comment 3. posted by gbtami 2007-09-17

Seems that broblem occurs when one process (gnuchess, crafty or pychess) started to create the /tmp/bitboars file, while others reading it before it was finished. Maybe.

pychess commented 9 years ago

Original comment 4. posted by lobais 2007-09-17

Ok, could one of you try to insert " print bitsArray0" between line 33 and 34 in Utils.lutils.bitboard.py? If stuff works correctly it will output quite a lot of data, but just before the error is raised it seams to be something not right. Perhaps None.

If it is None, perhaps can you try to add a Lock arround the iterBits function, so it'll be:

from threading import Lock lock = Lock() def iterBits (bitboard): lock.acquire()
try: ensureBitArraysLoaded () return ... finally: lock.release()

pychess commented 9 years ago

Original comment 5. posted by gbtami 2007-09-17

This doesn't helped, but seems changing in engineNest.py line 185 from e = self.initEngine (engine, WHITE) to e = self.initEngine (engine, BLACK) avoids the race in engine discovering processes, as this not triggers _applyFen in Board init.

With this workaround i get no errors, but one timing/race problem remaining. PyChess sometime not recognise all 3 engines as analyzing, only 2 :(

pychess commented 9 years ago

Original comment 6. posted by lobais 2007-09-18

I don't see how it can be about multiple processes reading/writing to /tmp/bitboards at the same time, by two reasons: It doesn't work to put a lock around iterBits, and even in this case bitsArray0 should still have been initied as a list.

Can you send me a log copy in one of those cases, when 2 rather than 3 engines are discovered?

pychess commented 9 years ago

Original comment 7. posted by gbtami 2007-09-18

On my office PC with an old Celeron (no muliple procs!) i deleted /temp/bitboards and ~/.pychess. In svn 603 i made only above change (WHITE to BLACK), so /tmp/bitboards not playing here. Started pychess, and it dicovered PyChess and gnuchess OK, but Crafty not OK.

pychess commented 9 years ago

Original comment 8. posted by lobais 2007-09-18

I was able to reproduce this on a gutsy tribe 5 livecd. However unlike you I was able to fix it by adding locks arround iterBits. On the other hand, I don't really like locking in a hyper optimized function like this, so I added a filelocking in the ensureBitArraysLoaded function using fcntl.flock.

I committed it as rev 604. I hope we got it. Besides that I think changing the color from WHITE to BLACK is a good idea, as it'll probably speed up loading.

pychess commented 9 years ago

Original comment 9. posted by leogregianin 2007-09-18

Now is running well.

pychess commented 9 years ago

Original comment 10. posted by gbtami 2007-09-18

Confirmed. Your fix solved the original bug.

The other problem still exist (with or without WHITE to BLACK change). PyChess can always discovers all the 3 engines, but randomly the third's features are not copleted: in the attached engines.xml Seems changing the timeout to 3*1000 in CECPProtocol.py line 86 fixes this too.

pychess commented 9 years ago

Original comment 11. posted by lobais 2007-09-18

Does this happen everytime or only when bitboards have to be created? I'm thinking of adding a protover2 attribute in the hardcoded xml. This way we could totally skip the timeout and take the time it takes for engines we know support protover2 eventually.

pychess commented 9 years ago

Original comment 12. posted by gbtami 2007-09-19

Everytime.

pychess commented 9 years ago

Original comment 13. posted by lobais 2007-09-19

Ok. Revision 609 will give 10 secconds to init for engines which are known to support protover 2. I guess this should fix stuff. Please change status if it does.

pychess commented 9 years ago

Original comment 14. posted by gbtami 2007-09-19

With svn 609. Now PyChess was detected as not analyzing.

pychess commented 9 years ago

Original comment 15. posted by gbtami 2007-09-19

Here is another 2 try.

pychess commented 9 years ago

Original comment 16. posted by gbtami 2007-09-20

The above logs was all produced on a P4 3GHz(HT) at home with svn 609. On my office PC with a Celeron 2.66 GHz(no HT) no problem with svn 609! Both runs the same 2.6.17 SMP kernel, and 2.10.14 gtk2 and 2.10.4 pygtk.

pychess commented 9 years ago

Original comment 17. posted by lobais 2007-09-20

Was the last logfile 2007-09-19_21-15-49.log produced on your office PC? It has got the "2007-09-19 21:15:49 Python Debug: feature draw=1 reuse=0 myname='PyChess 0.8beta1' usermove=1 setboard=1 sigterm=1 analyze=1" lines that the others need.

You could try to run the engine on its own (You can read how in the INSTALL) and see if it blocks or something.

Btw. Why do you run an SMP kernel on a non HT processor? Isn't SMP desgined for exactly executing two threads one one kernel with HT?

pychess commented 9 years ago

Original comment 18. posted by gbtami 2007-09-20

All the logs i'v attached in this issue was produced on my home pc.

In the 2007-09-19_21-15-49.log PyChess was recognised as analyzing engine, but crafty and gnuchess not !

As i can summarise, on multi processor/HT pychess randomly recognise engine features.

There is no problem on office pc (Celeron). My distrib ships 1(SMP enabled) kernel only.

pychess commented 9 years ago

Original comment 19. posted by lobais 2007-09-21

I found a dualcore to do some testing on, and I believe I found and fixed a nobrainer in the bitboard locking. Can you try out revision 614.

pychess commented 9 years ago

Original comment 20. posted by lobais 2007-09-21

As an extra stress test, it would be interesting to see "PYTHONPATH=lib/ ./pychess & PYTHONPATH=lib/ ./pychess" run on a multicore. It works fine for me, but on a multicore we might get the same problems on the engines.xml and temp.png files, we currently have on the bitbaords file.

pychess commented 9 years ago

Original comment 21. posted by gbtami 2007-09-21

svn 614 doesn't fix the problem. After you changed in engineNest.py from e = self.initEngine (engine, WHITE) to e = self.initEngine (engine, BLACK) bitboard.py is not playing in this bug anymore, so the problem lies somewhere else :( You can test the problem only without engines.xml, so you have to delete ~/.pychess

pychess commented 9 years ago

Original comment 22. posted by gbtami 2007-09-21

Btw. why don't you use Python's builtin subprocess module?

pychess commented 9 years ago

Original comment 23. posted by lobais 2007-09-21

Hm, I felt like the change in bitboards solved the problem for the dualcore machine on which I tested (by deleting ~/.pychess as you describe). But you are naturally right that bitboards aren't even created before the first game is started, so I'll go and look for another reason.

submodule has no support for pty which is necessary when we want engines to die together with PyChess. GlChess actually copied the PyChess SubProcess code. Earlier when glchess died the engine would continue searching the three with a 100% cpu usage, until the user killed it manually.

pychess commented 9 years ago

Original comment 24. posted by gbtami 2007-09-23

Tested on ubuntu gutsy. Same random result. Then i switched off HT in the BIOS, and installed sjeng too. I got the correct result in 8/10 cases. So i was able to reproduce the bug on a non multiprocessor environment too.

pychess commented 9 years ago

Original comment 25. posted by lobais 2007-10-06

Can you try a few tests again with rev 679? If you still get problems once in a while, try to edit lib/pychess/System/SubProcess.py in line 48, and move the # from initGlc to initPty. Then the glchess way will be used. It uses no pty, but spawns an awfull lot of processes and doesn't always kill them probably when glchess is itself killed.

pychess commented 9 years ago

Original comment 26. posted by gbtami 2007-10-07

With r679 i get the same random result. Changing to use initGlc i get: tamas@tami:~$ p 2007-10-07 14:04:44 Default Error: Thread <Worker(Thread-5, started daemon)> in threadpool raised following error: Traceback (most recent call last): File "/home/tamas/svn/pychess/lib/pychess/System/ThreadPool.py", line 51, in run self.func() File "/home/tamas/svn/pychess/lib/pychess/System/ThreadPool.py", line 33, in a.func = lambda: func(*args) File "/home/tamas/svn/pychess/lib/pychess/Players/engineNest.py", line 243, in _findOutMore e.kill(UNKNOWN_REASON) UnboundLocalError: local variable 'e' referenced before assignment

pychess commented 9 years ago

Original comment 27. posted by gbtami 2007-10-07

It was with r682.

pychess commented 9 years ago

Original comment 28. posted by lobais 2007-10-07

Can you post the log.

pychess commented 9 years ago

Original comment 29. posted by gbtami 2007-10-07

log attached

pychess commented 9 years ago

Original comment 30. posted by gbtami 2007-10-07

next one

pychess commented 9 years ago

Original comment 31. posted by gbtami 2007-10-07

next

pychess commented 9 years ago

Original comment 32. posted by gbtami 2007-10-07

with initPty()

pychess commented 9 years ago

Original comment 33. posted by gbtami 2007-10-07

next

pychess commented 9 years ago

Original comment 34. posted by gbtami 2007-10-07

it's only waiting

pychess commented 9 years ago

Original comment 35. posted by lobais 2007-10-07

Seams to be fixed in rev 687

pychess commented 9 years ago

Original comment 36. posted by gbtami 2007-10-08

It's a phoenix! After i cleared my changes, and made an svn up, tried it again, and it produced random results again. Today i tested on my old offive PC (Celeron), and asked my collegue, to test it on hes Ubuntu Feisty (Celeron too), we got atacched logs. (The 2007-10-08_09-31-50.log contains a "Szegmentációs hiba"="Segmentetion error" too, huh.)

pychess commented 9 years ago

Original comment 37. posted by lobais 2007-10-12

I think 704 solves this for me.

pychess commented 9 years ago

Original comment 38. posted by gbtami 2007-10-22

  1. It's getting better :)
  2. Sometimes glaurung is waiting for something, and pychess gui doesn't start, so i have to press a Ctrl+C to quit. see /home/tamas/2007-10-22_10-05-55.log
  3. Sometimes sjeng doesn't recognised. see /home/tamas/2007-10-22_10-10-03.log
  4. After rebooted to ubuntu gutsy, i get these /home/tamas/2007-10-22_10-22-44.log /home/tamas/2007-10-22_10-26-37.log

All in all, it's not perfect yet :(

pychess commented 9 years ago

Original comment 39. posted by lobais 2007-10-22

Strange, did you notice: 10:05:56 stdout Debug: path 10:07:19 Glaurung Log: stop

and in initPty we have: print "path", self.path, "args", self.args

So glaurung actually hang while printing.

Hm, I'll have to download it to do some testing.

pychess commented 9 years ago

Original comment 40. posted by lobais 2007-10-22

I think there might be a problem when quickly forking a ton of ptys. Try in engineNest.py line 376 to change "1 or 0" to "1 or 2"

pychess commented 9 years ago

Original comment 41. posted by gbtami 2007-10-23

now

pychess commented 9 years ago

Original comment 42. posted by lobais 2007-10-23

Once again I declare this bug for fixed.