Closed alemirone closed 1 year ago
Hi. Which Python version did you use? Which fabio version did you use? Which mpi4py? Did you try with Python3?
No problem here, on Debian8, Python2 of the system, Python3 of the system, custom Python3.5 with fresh lib from pypi.
If you need help, it could be useful to put hands on your environment.
Hello Valentin, it happens on the infiniband cluster : nodes hib-something The environment is accessible through OAR : http://wikiserv.esrf.fr/software/index.php/Main_Page
No enviroment however, just the python from Debian of the cluster
On 04/30/2018 09:11 AM, Valentin Valls wrote:
No problem here, on Debian8, Python2 of the system, Python3 of the system, custom Python3.5 with fresh lib from pypi.
If you need help, it could be useful to put hands on your environment.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/silx-kit/fabio/issues/218#issuecomment-385326175, or mute the thread https://github.com/notifications/unsubscribe-auth/ACaVdy-XSyTHc_2qb84ByMqLbpMk8tFGks5ttrkTgaJpZM4Tlb8U.
Reproduced using
ssh -X rnice
oarsub -q ib -I
python script.py
[hib2-1508:49216] *** Process received signal ***
[hib2-1508:49216] Signal: Segmentation fault (11)
[hib2-1508:49216] Signal code: Address not mapped (1)
[hib2-1508:49216] Failing at address: 0x3141208
The use of gdb creates a deadlock or have an infinitloop
Well, i check to remove things from fabio with a virtualenv.
I dont find anything which really works. Still have deadlock or segfault if i remove most of the fabio modules. Then no idea what's going on.
But using the last mpi4py
3.0 looks to fix the problem. The one on your system is the 1.3.1
recent version of mpi4py fix the issue... so the issue was in mpi4py rather than in fabio.
The problem appears on the infiniband nodes. It consists in the script (below) hanging or crashing. More rarely it works. If you do the import of mpi4py before fabio the problem disappears.
There is also another feature which is a normal behaviour that gives some warning from mpi4py complaining that somebody is doing a fork ( it is done by sub.Popen(args=string.split(comando, " ") ,stdout=sub.PIPE,stderr=sub.PIPE) ) but in principle the spawned process ends immediately before mpi library starts working. The spawn is necessary to get information for the other parts of the program from which the short script is extracted, so I thought it should not create problem if it is executed somewhere at the very beginning of the program. What happens seems to indicate that fabio is doing something under the hoods
import sys import string import os import fabio import mpi4py.MPI as MPI import subprocess as sub
comando = 'taskset -cp %d'%(os.getpid()) print(" EXECUTING COMMAND ", comando) p = sub.Popen(args=string.split(comando, " ") ,stdout=sub.PIPE,stderr=sub.PIPE) print(" WAITING ") cpuset_string, errors = p.communicate() print cpuset_string, errors