Closed GoogleCodeExporter closed 8 years ago
Strange, I'm unable to reproduce this.
Can you try again with 603?
Original comment by lobais
on 17 Sep 2007 at 2:16
With me is happening the same
Original comment by leogregianin@gmail.com
on 17 Sep 2007 at 3:40
Seems that broblem occurs when one process (gnuchess, crafty or pychess)
started to
create the /tmp/bitboars file, while others reading it before it was finished.
Maybe.
Original comment by gbtami
on 17 Sep 2007 at 5:36
Ok, could one of you try to insert " print bitsArray0" between line 33 and
34 in
Utils.lutils.bitboard.py?
If stuff works correctly it will output quite a lot of data, but just before the
error is raised it seams to be something not right. Perhaps None.
If it is None, perhaps can you try to add a Lock arround the iterBits function,
so
it'll be:
from threading import Lock
lock = Lock()
def iterBits (bitboard):
lock.acquire()
try:
ensureBitArraysLoaded ()
return ...
finally:
lock.release()
Original comment by lobais
on 17 Sep 2007 at 5:54
This doesn't helped, but seems changing in engineNest.py line 185
from
e = self.initEngine (engine, WHITE)
to
e = self.initEngine (engine, BLACK)
avoids the race in engine discovering processes, as this not triggers _applyFen
in
Board __init__.
With this workaround i get no errors, but one timing/race problem remaining.
PyChess sometime not recognise all 3 engines as analyzing, only 2 :(
Original comment by gbtami
on 17 Sep 2007 at 7:30
I don't see how it can be about multiple processes reading/writing to
/tmp/bitboards
at the same time, by two reasons: It doesn't work to put a lock around
iterBits, and
even in this case bitsArray0 should still have been initied as a list.
Can you send me a log copy in one of those cases, when 2 rather than 3 engines
are
discovered?
Original comment by lobais
on 18 Sep 2007 at 6:04
On my office PC with an old Celeron (no muliple procs!) i deleted
/temp/bitboards and
~/.pychess.
In svn 603 i made only above change (WHITE to BLACK), so /tmp/bitboards not
playing here.
Started pychess, and it dicovered PyChess and gnuchess OK, but Crafty not OK.
Original comment by gbtami
on 18 Sep 2007 at 8:11
Attachments:
I was able to reproduce this on a gutsy tribe 5 livecd.
However unlike you I was able to fix it by adding locks arround iterBits.
On the other hand, I don't really like locking in a hyper optimized function
like
this, so I added a filelocking in the ensureBitArraysLoaded function using
fcntl.flock.
I committed it as rev 604. I hope we got it.
Besides that I think changing the color from WHITE to BLACK is a good idea, as
it'll
probably speed up loading.
Original comment by lobais
on 18 Sep 2007 at 3:36
Now is running well.
Original comment by leogregianin@gmail.com
on 18 Sep 2007 at 4:18
Confirmed. Your fix solved the original bug.
The other problem still exist (with or without WHITE to BLACK change).
PyChess can always discovers all the 3 engines, but randomly the third's
features are
not copleted: <feature command="analyze" supports="false"/> in the attached
engines.xml
Seems changing the timeout to 3*1000 in CECPProtocol.py line 86 fixes this too.
Original comment by gbtami
on 18 Sep 2007 at 8:34
Does this happen everytime or only when bitboards have to be created?
I'm thinking of adding a protover2 attribute in the hardcoded xml. This way we
could
totally skip the timeout and take the time it takes for engines we know support
protover2 eventually.
Original comment by lobais
on 18 Sep 2007 at 10:18
Everytime.
Original comment by gbtami
on 19 Sep 2007 at 6:42
Ok. Revision 609 will give 10 secconds to init for engines which are known to
support
protover 2.
I guess this should fix stuff. Please change status if it does.
Original comment by lobais
on 19 Sep 2007 at 6:38
With svn 609. Now PyChess was detected as not analyzing.
Original comment by gbtami
on 19 Sep 2007 at 7:09
Attachments:
Here is another 2 try.
Original comment by gbtami
on 19 Sep 2007 at 7:49
Attachments:
The above logs was all produced on a P4 3GHz(HT) at home with svn 609.
On my office PC with a Celeron 2.66 GHz(no HT) no problem with svn 609!
Both runs the same 2.6.17 SMP kernel, and 2.10.14 gtk2 and 2.10.4 pygtk.
Original comment by gbtami
on 20 Sep 2007 at 7:00
Was the last logfile 2007-09-19_21-15-49.log produced on your office PC? It has
got
the "2007-09-19 21:15:49 Python Debug: feature draw=1 reuse=0 myname='PyChess
0.8beta1' usermove=1 setboard=1 sigterm=1 analyze=1" lines that the others need.
You could try to run the engine on its own (You can read how in the INSTALL)
and see
if it blocks or something.
Btw. Why do you run an SMP kernel on a non HT processor? Isn't SMP desgined for
exactly executing two threads one one kernel with HT?
Original comment by lobais
on 20 Sep 2007 at 2:14
All the logs i'v attached in this issue was produced on my home pc.
In the 2007-09-19_21-15-49.log PyChess was recognised as analyzing engine, but
crafty
and gnuchess not !
As i can summarise, on multi processor/HT pychess randomly recognise engine
features.
There is no problem on office pc (Celeron). My distrib ships 1(SMP enabled)
kernel only.
Original comment by gbtami
on 20 Sep 2007 at 5:19
I found a dualcore to do some testing on, and I believe I found and fixed a
nobrainer
in the bitboard locking.
Can you try out revision 614.
Original comment by lobais
on 21 Sep 2007 at 12:58
As an extra stress test, it would be interesting to see "PYTHONPATH=lib/
./pychess &
PYTHONPATH=lib/ ./pychess" run on a multicore. It works fine for me, but on a
multicore we might get the same problems on the engines.xml and temp.png files,
we
currently have on the bitbaords file.
Original comment by lobais
on 21 Sep 2007 at 2:41
svn 614 doesn't fix the problem.
After you changed in engineNest.py from
e = self.initEngine (engine, WHITE)
to
e = self.initEngine (engine, BLACK)
bitboard.py is not playing in this bug anymore, so the problem lies somewhere
else :(
You can test the problem only without engines.xml, so you have to delete
~/.pychess
Original comment by gbtami
on 21 Sep 2007 at 6:25
Btw. why don't you use Python's builtin subprocess module?
Original comment by gbtami
on 21 Sep 2007 at 6:27
Hm, I felt like the change in bitboards solved the problem for the dualcore
machine
on which I tested (by deleting ~/.pychess as you describe).
But you are naturally right that bitboards aren't even created before the first
game
is started, so I'll go and look for another reason.
submodule has no support for pty which is necessary when we want engines to die
together with PyChess.
GlChess actually copied the PyChess SubProcess code. Earlier when glchess died
the
engine would continue searching the three with a 100% cpu usage, until the user
killed it manually.
Original comment by lobais
on 21 Sep 2007 at 6:51
Tested on ubuntu gutsy. Same random result.
Then i switched off HT in the BIOS, and installed sjeng too.
I got the correct result in 8/10 cases. So i was able to reproduce the bug on a
non
multiprocessor environment too.
Original comment by gbtami
on 23 Sep 2007 at 11:08
Can you try a few tests again with rev 679?
If you still get problems once in a while, try to edit
lib/pychess/System/SubProcess.py in line 48, and move the # from initGlc to
initPty.
Then the glchess way will be used. It uses no pty, but spawns an awfull lot of
processes and doesn't always kill them probably when glchess is itself killed.
Original comment by lobais
on 6 Oct 2007 at 10:11
With r679 i get the same random result.
Changing to use initGlc i get:
tamas@tami:~$ p
2007-10-07 14:04:44 Default Error: Thread <Worker(Thread-5, started daemon)> in
threadpool raised following error:
Traceback (most recent call last):
File "/home/tamas/svn/pychess/lib/pychess/System/ThreadPool.py", line 51, in run
self.func()
File "/home/tamas/svn/pychess/lib/pychess/System/ThreadPool.py", line 33, in <lambda>
a.func = lambda: func(*args)
File "/home/tamas/svn/pychess/lib/pychess/Players/engineNest.py", line 243, in
_findOutMore
e.kill(UNKNOWN_REASON)
UnboundLocalError: local variable 'e' referenced before assignment
Original comment by gbtami
on 7 Oct 2007 at 12:06
It was with r682.
Original comment by gbtami
on 7 Oct 2007 at 12:13
Can you post the log.
Original comment by lobais
on 7 Oct 2007 at 2:01
with initPty()
Original comment by gbtami
on 7 Oct 2007 at 3:09
Attachments:
it's only waiting
Original comment by gbtami
on 7 Oct 2007 at 3:34
Attachments:
Seams to be fixed in rev 687
Original comment by lobais
on 7 Oct 2007 at 5:13
It's a phoenix! After i cleared my changes, and made an svn up, tried it again,
and
it produced random results again. Today i tested on my old offive PC (Celeron),
and
asked my collegue, to test it on hes Ubuntu Feisty (Celeron too), we got
atacched logs.
(The 2007-10-08_09-31-50.log contains a "Szegmentációs hiba"="Segmentetion
error"
too, huh.)
Original comment by gbtami
on 8 Oct 2007 at 8:05
Attachments:
I think 704 solves this for me.
Original comment by lobais
on 12 Oct 2007 at 11:53
1. It's getting better :)
2. Sometimes glaurung is waiting for something, and pychess gui doesn't start,
so
i have to press a Ctrl+C to quit. see /home/tamas/2007-10-22_10-05-55.log
3. Sometimes sjeng doesn't recognised. see /home/tamas/2007-10-22_10-10-03.log
4. After rebooted to ubuntu gutsy, i get these
/home/tamas/2007-10-22_10-22-44.log
/home/tamas/2007-10-22_10-26-37.log
All in all, it's not perfect yet :(
Original comment by gbtami
on 22 Oct 2007 at 8:54
Attachments:
Strange, did you notice:
10:05:56 stdout Debug: path
10:07:19 Glaurung Log: stop
and in initPty we have:
print "path", self.path, "args", self.args
So glaurung actually hang while printing.
Hm, I'll have to download it to do some testing.
Original comment by lobais
on 22 Oct 2007 at 11:31
I think there might be a problem when quickly forking a ton of ptys.
Try in engineNest.py line 376 to change "1 or 0" to "1 or 2"
Original comment by lobais
on 22 Oct 2007 at 4:09
Once again I declare this bug for fixed.
Original comment by lobais
on 23 Oct 2007 at 6:30
Original issue reported on code.google.com by
gbtami
on 15 Sep 2007 at 5:16