thouska / spotpy

A Statistical Parameter Optimization Tool
https://spotpy.readthedocs.io/en/latest/
MIT License
254 stars 152 forks source link

Dream sampler. Parallel computation #266

Open baramousa opened 3 years ago

baramousa commented 3 years ago

Hi, this is not really an issue. i just want to know which version of dream this package is?. Is it the basic dream or dreamzs or mt-dreamzs.. Am asking because i want to know if parallel computation is possible . As far as i know basic dream can be run only sequential while the others can be run in parallel.

Thanks @thouska

thouska commented 3 years ago

Hi @baramousa The dream algorithm version implemented in spotpy corresponds to the Algorithm 6 as presented in this publication: https://www.sciencedirect.com/science/article/pii/S1364815215300396?casa_token=gCl00Qy8ymsAAAAA:BcW90XS8GyI2Rwi7sJnunxAUOAhfQMz9eEHTWSbjgvPflnUxF5DI7cm3qq1OzXro01_bdf3Pyz4 So it can be run in parallel. If you want, you can check out this example, which is set-up to run n=4 chains. https://github.com/thouska/spotpy/blob/master/spotpy/examples/tutorial_dream_hymod.py Chaning the parallel keyword to 'mpi' and hand this setting to the sampler, would result that spotpy start each of the chains runs on an individual cpu core. Some details about this are given here: https://github.com/thouska/spotpy/blob/master/spotpy/examples/tutorial_parallel_computing_hymod.py

baramousa commented 3 years ago

thanks for the quick reply. just another question let us say i want to implement the hymod example below. https://github.com/thouska/spotpy/blob/master/spotpy/examples/tutorial_dream_hymod.py

if i want to implement it it parallel then i need to set parellel to equal "mpi" for linux machine and to "mpc" for windows machine. am i getting it right ?

thouska commented 3 years ago

Yes, thats correct.

baramousa commented 3 years ago

ok. thanks a lot . I wil try it and give my feedback

baramousa commented 3 years ago

Hi @thouska . i tried to run your hymod_dream example in parallel. it seams to run but then i get a warning. First this is the code: import numpy as np import spotpy import matplotlib.pyplot as plt from spotpy.likelihoods import gaussianLikelihoodMeasErrorOut as GausianLike from spotpy.analyser import plot_parameter_trace from spotpy.analyser import plot_posterior_parameter_histogram import sys if name == "main": parallel ='mpc' from spotpy.examples.spot_setup_hymod_unix import spot_setup spot_setup=spot_setup(GausianLike) sampler=spotpy.algorithms.dream(spot_setup, dbname='DREAM_hymod',parallel=parallel, dbformat='csv') rep=5000 nChains = 4 convergence_limit = 1.2 nCr = 3 eps = 10e-6 runs_after_convergence = 100 acceptance_test_option = 6 r_hat = sampler.sample(rep, nChains, nCr, eps, convergence_limit) results = spotpy.analyser.load_csv_results('DREAM_hymod')

Then i get this warning:

Convergence rates =1.5744 4.8378 1.4476 1.3106 1.5791 1003 of 5000, maximal objective function=-8270.54, time remaining: 00:04:34 Acceptance rates [%] =15.08 13.89 11.51 25. Convergence rates =1.5756 5.1658 1.4241 1.3066 1.5492 1021 of 5000, maximal objective function=-8270.54, time remaining: 00:04:36 Acceptance rates [%] =15.12 13.95 11.63 25.19 Convergence rates =1.5518 5.793 1.4008 1.298 1.5164

IndexError Traceback (most recent call last)

in 11 runs_after_convergence = 100 12 acceptance_test_option = 6 ---> 13 r_hat = sampler.sample(rep, nChains, nCr, eps, convergence_limit) 14 results = spotpy.analyser.load_csv_results('DREAM_hymod') 15 c:\users\albaraalmawazreh\appdata\local\programs\python\python37\lib\site-packages\spotpy\algorithms\dream.py in sample(self, repetitions, nChains, nCr, eps, convergence_limit, runs_after_convergence, acceptance_test_option) 274 while self.iter < self.repetitions: 275 param_generator = ((curChain,self.get_new_proposal_vector(curChain,newN,nrN)) for curChain in range(int(self.nChains))) --> 276 for cChain,par,sim in self.repeat(param_generator): 277 pCr = np.random.randint(0,nCr) 278 ids=[] c:\users\albaraalmawazreh\appdata\local\programs\python\python37\lib\site-packages\spotpy\parallel\mproc.py in __call__(self, jobs) 52 def __call__(self, jobs): 53 results = self.pool.imap(self.f, jobs) ---> 54 for i in results: 55 yield i c:\users\albaraalmawazreh\appdata\local\programs\python\python37\lib\site-packages\multiprocess\pool.py in next(self, timeout) 746 if success: 747 return value --> 748 raise value 749 750 __next__ = next # XXX IndexError: list index out of range
thouska commented 3 years ago

Hi @baramousa thank you for your message and the detailed error describtion. I can confirm an error there and will look into this together with @philippkraft. Sorry for any inconvience this may cause to you. I will keep you posted about the progress.

thouska commented 3 years ago

Hi @baramousa, it turns out to be quiete a task to solve this issue. We will work on this at #268 and also on local machines. It might take a while and I cannot gurantee final success atm. Meanwhile, would 'mpi' parallelization be a solution for you? This should work fine :)

baramousa commented 3 years ago

Hi @thouska . Sorry for the late reply. I downloaded Anaconda which has the pythom 3.8 or newer one, and tried the parallelisation of dream on windows and it seems to work. However my issue now is that my model write input and output data as text files. And in order for the parallelisation to work effictively, each chain should has it is own directory were write and read input and output files. My question is if there is a way to extract the id/number of the currently running chains so i can insert them in my model to create a directory for each of them. Since i am also trying to use SCE-UA, would you suggest a way to do the same with it. thanks in advance :)

thouska commented 3 years ago

Hi @baramousa, ok I have not tested yet with the newest Anaconda, would be great if it solves the problem! Regarding the parlallel writing/reading, you are perfectly right. One needs to handle that this is done individually for each core. I wrote a short example for that, which you can find here.

Basically under 'mpi' you can access the cpu_id this way:

cpu_id = str(int(os.environ['OMPI_COMM_WORLD_RANK']))

Under 'mpc' it is done like this:

cpu_id = str(os.getpid())

I would recommend to work with these, instead of usinf the chain_id (in case of dream) or complex_id (in case of sce-ua), as the above example works independent of the choice of the algorithm in spotpy.

baramousa commented 3 years ago

Hi @thouska , thanks for your answer. now it works. input and output files are being written and read in individual directories corresponding to the core name. however now the csv summary file which should include the whole results of all simulations is only having the very last carried out simulations of each chain for dream and no data for sceua. The simulations run and summary is shown in the console but the csv files are not written properly. Can you tell where is the problem?

i am guessing it has to do with this script in _algorithm.py :

        def save(self, like, randompar, simulations, chains=1):

            # Initialize the database if no run was performed so far

            self._init_database(like, randompar, simulations)

            # Test if like and the save threshold are float/list and compare accordingly

            if self.__is_list_type(like) and self.__is_list_type(self.save_threshold):

                if all(i &gt; j for i, j in zip(like, self.save_threshold)): #Compares list/list

                    self.datawriter.save(like, randompar, simulations, chains=chains)

            if (not self.__is_list_type(like)) and (not self.__is_list_type(self.save_threshold)):

                if like&gt;self.save_threshold: #Compares float/float

                    self.datawriter.save(like, randompar, simulations, chains=chains)

            if self.__is_list_type(like) and (not self.__is_list_type(self.save_threshold)):

                if like[0]&gt;self.save_threshold: #Compares list/float

                    self.datawriter.save(like, randompar, simulations, chains=chains)

            if (not self.__is_list_type(like)) and self.__is_list_type(self.save_threshold): #Compares float/list

                if (like &gt; self.save_threshold).all:
thouska commented 3 years ago

Hi @baramousa, thank you for the update! And indeed, the broken file was the point where I got stuck at #268. To be honest I did not fully understand why this did not work, as results are internally perfectly fine, but where not in the final output file.

However, I looked again into this, played a lot around and can finally come up with a fix (see commit above). Basically I change line 53 in mproc.py

from:

results = self.pool.imap(self.f, jobs)

into:

results = self.pool.map(self.f, jobs)

Now it works fine for me. At least in 90% of the cases. From time to time the header is broken, but the rest should be fine. @baramousa: Could you test for your case and give your feedback here?

baramousa commented 3 years ago

Hi @thouska , indeed when i change the dbfformat to 'ram', they seem fine. Well i tried now your solution and it worked with sceua algorthim but dream still have the same problem, only the last runs are saved in csv file. On the other hand mpi on linux machine seems to work.

thouska commented 3 years ago

Hi @baramousa sorry for the late response, but at least I can come up with good news, I hope :) I worked in #268 on the issue. You were right, somehow only the dream algorithm did not work proberly under the pathos multprocessing settings. This was due to to many pools that were generated during the Markov Chains. I tried to fix it, but at the end I had the feeling that this is a problem in the pathos package. So I changed the package to joblib. With that the parallelization works with dream on my computer. Could you test too? I changed the tutorial_dream_hymod.py in a way, so that it is directly using multiprocessing.