Closed MuellerSeb closed 4 years ago
After some bug tracking I think, the problem is in this line: https://github.com/thouska/spotpy/blob/269a5a7435f1e45d7ad90bb32d4ed9df89f77943/spotpy/parallel/mpi.py#L200
where self.comm.Iprobe(source=i+1, tag=tag.answer)
never evaluates to true.
Maybe this is related to this: https://groups.google.com/forum/#!topic/mpi4py/RiK8Fhd3LIU
But I've run out of ideas at this point.
Hi Sebastian, sorry for the long silence - vacation period. We "fixed" some SCE-UA bugs with the last version, I have to check the changes together with @thouska - who is still out of office. Can you check another sampler, if you have the same problems there? (e.g. ROPE or LHS). Just to make sure it is in the SCE-UA implementation (which is tricky) and not a general parallel='mpi'
problem.
@philippkraft : Thanks for the reply. I checked the FAST routine, which worked as expected.
Something new on this topic? Cheers, Sebastian
Hi Sebastian, unfortunatelly, there is not much new on this topic. At least I can confirm your error description. I am on it and will inform you here as soon as this is fixed. Sorry that it takes so long... Based on your report, we are also working to test the mpi implementation on travis (#231), so that such erros can, hopefully, be avoided in the future.
Ok, now it should be fixed. Somehow this in spotpy version 1.5.0 introduced new design of the _RunStatistic class in _algorithm.py was not pickable under mpi4py. This resulted your described stuck after the burn-in phase. I removed the use of the _RunStatistic class while spotpy is running on cpu-slaves. This fixes the problem (at least in my mpi environment). The change might result in a bit longer runtimes at the end of the sampling (will be fixed), but for now it is at least running again.
PS: If you want to test this, the corresponding new version (1.5.3) of spotpy is available on pypi.
I installed spotpy 1.5.4 and now I am getting the following error:
File "/usr/local/lib/python3.6/dist-packages/spotpy/__init__.py", line 41, in <module>
from . import unittests
ImportError: cannot import name 'unittests'
The submodule unittests
is missing in the package. This is due to this line in the setup.py:
https://github.com/thouska/spotpy/blob/0d550741d6d5e882e119e1c7ca140b4be8ffa644/setup.py#L16
you should use this instead:
packages=find_packages(exclude=["tests*", "docs*"])
with this on the first line:
from setuptools import setup, find_packages
But after commenting out the from . import unittests
it now works.
Maybe you could shift the unittests folder to a toplevel folder named tests, as mentioned in the exclude pattern, which is a common way, Than you have to adopt the .travis.yml file. I dont think the unit tests need to be in the package when there is a separate example folder.
I had similar problems but I just saw @thouska just updated but I mean [I have not] test it out the newest version. :D . I will do it now. :D
Many thanks @MuellerSeb that you directly tested everything and reported such a detailed way how to fix the new problems. As you recommended, I removed the unittest import, renamed the unittests folder to tests and moved the whole thing to the toplevel. I like the new structure and think this makes totaly sense. As @hpsone found out faster than I could answer to this issue: There is a new version on pypi containing the fix.
Sorry for my rush comment. I want to say I have not tested it yet. But now I tested it and it is not working for me. May be it is my mistake in the model but my mpi is working properly as I tested it with Telemac2d. What could be the possible error. Anyway, @thouska thank you very much for help. Best Regards Htun
@hpsone : maybe you have to give some details on your problem to get an answer.
@MuellerSeb Thank you so much. I am not quite sure what is the error. But I did run using "mpc" instead of "mpi" and it worked. Anyway I will try again but it probably might be my insufficient knowledge.
I guess this issue is solved, if not feel free to reopen.
Hey there,
from spotpy 1.5.0 on, sce optimization with MPI get stuck after the burn in phase. Here is a minimal example:
Running with
Gives the following output:
And from there on, nothing more happens. With
parallel="seq"
it takes about 5 seconds to finish. Do you know what the problem could be?I've got
mpi4py 3.0.2
installed and I am using Python 3.6.8. With spotpy 1.4.6 everything is working. From 1.5.0 on the above mentioned behavior occurs.Cheers, Sebastian