multithreading / multiprocessing for optimization

Did some experiments regarding https://tespy.readthedocs.io/en/main/tutorials_examples.html#thermal-power-plant-efficiency-optimization Using the given optimizer I noticed only 3% of my 16 core CPU were utilized. So I tried to apply an optimizer capable of parallel function evaluation - but noticed strange results / exceptions. The following code can be used to simulate the effect:

 def test():

    class calc (threading.Thread):

        def __init__(self, threadID):
            threading.Thread.__init__(self)
            self.threadID = threadID
            self.model = PowerPlant()

        def run(self):
            print ("Starting " + str(self.threadID))
            for i in range(100):
                try:
                    print(self.threadID, i, self.model.calculate_efficiency([40.0,1.0]))
                except Exception as ex:
                    pass

    for i in range(32):
        calc(i).start()

You see errors like

13:24:46-ERROR-Singularity in jacobian matrix, calculation aborted! Make sure your network does not have any linear dependencies in the parametrisation. Other reasons might be -> given temperature with given pressure in two phase region, try setting enthalpy instead or provide accurate starting value for pressure. -> given logarithmic temperature differences or kA-values for heat exchangers, -> support better starting values. -> bad starting value for fuel mass flow of combustion chamber, provide small (near to zero, but not zero) starting value.

As you can see, I create a separate model instance for each thread, but they seem not to be independent. Why is that so? Is there anything I can do (setting a configuration parameter) to get this working?

For optimization for bigger models it would be nice if multi threaded execution would be supported. Exceptions I can filter out, but unfortunately sometimes just wrong values are returned. So I could "optimize" the powerplant to nearly 100% efficiency, but using the resulting x-value later showed a "normal" efficiency.

My OS is Linux Mint 20.2 based on Ubuntu, python is anaconda for python 3.8.

Tried a workaround:

Using 'memorise_fluid_properties=False' in the network config

class PowerPlant():

    def __init__(self):
        self.nw = Network(
            fluids=['BICUBIC::water'],
            p_unit='bar', T_unit='C', h_unit='kJ / kg',
            iterinfo=False, memorise_fluid_properties=False)

solved the problem not completely, I still got strange values. Additionally using thread local models in the fitness function:

        def __init__(self):
...
            self.local = threading.local()

        def get_model(self):
            try:
                return self.local.model
            except AttributeError:
                return self.new_model()

        def new_model(self):
            self.local.model = PowerPlant()
            return self.local.model

        def efficiency(self, x):   
            try: 
                eff = self.get_model().calculate_efficiency(x)    
                if not np.isfinite(eff):
                    return 0
            except Exception as ex:
                return 0  
            return eff

        def fitness(self, x):
            y = -self.efficiency(x)
...

I still get strange outliers. May be the usage of static values like

# create dict for tespy fluids
TESPyFluid.fluids = {}

generally is a bad idea. Instead you could define fluids as local class member of TESPyFluid (self.fluids = {}). Same with the Memorise dictionaries which should be members of Memorise. Static variables have strange side effects in case of parallel execution. See for instance https://volosoft.com/blog/Prefer-Singleton-Pattern-over-Static-Class this issue is not related to Python, it is valid for all programming languages.

Now I only can recreate the model for each function evaluation, which slows down things significantly. My CPU could handle about 600 model evaluations per second, if singletons instead of statics were used. And there are much bigger CPUs avaiable already.

Update;

Found out that recreating the model in case of any error helps - together with using thread local models:

      def get_model(self):
            try:
                return self.local.model
            except AttributeError:
                return self.new_model()

        def recreate_model(self):
            self.local.model = PowerPlant()

        def efficiency(self, x):   
            try: 
                eff = self.get_model().calculate_efficiency(x)    
                if not np.isfinite(eff):
                    self.recreate_model()
                    return 0
                return eff
            except Exception as ex:
                self.recreate_model()
                return 0

So it seems that at some time an error occurs, and if you recreate the model in this case it helps. The

                if not np.isfinite(eff):
                    self.recreate_model()

is required. Seems that sometimes a nan value is computed which leaves the model in "bad shape". Not sure this really has to do with multi threading. I could reproduce this issue also with single threaded optimizers, but not with pg.ihs from your example.

Now I have uploded a python script to actually reproduce the issue:

Execute https://github.com/dietmarwo/fast-cma-es/blob/master/examples/powerplant.py after commenting out https://github.com/dietmarwo/fast-cma-es/blob/61335c0278a7fcd9f241caedd5e7c003c01a987b/examples/powerplant.py#L260 which recreates the model in case of an error.

See the corresponding tutorial how to do parallel optmization with tespy: https://github.com/dietmarwo/fast-cma-es/blob/master/tutorials/PowerPlant.adoc

I noticed that BLAS parallelization as configured by anaconda actually harms the performance of tespy simulations. So I do now:

        def fitness(self, x):
            with threadpoolctl.threadpool_limits(limits=1, user_api="blas"):
                f1 = 1 / self.model.calculate_efficiency(x)

for pygmo fitness which both speeds up the simulation and reduces CPU consumption. This setting is essential for parallel fitness evaluation, fcmaes does this automatically.

Dear @dietmarwo,

thank you very much for reaching out, your investigations and suggestions are highly appreciated! I did not yet have the time to go through the code in detail, since I am currently working on TESPy in my free time only. Also, I do not have a lot of experience with multi-processing, so I will have a couple of questions, I guess.

Generally, implementing parallel processing might require some refactoring changes in the back-end. However, these might be benefitial for further developing the software in any case. There are some parts, that I might have set up differently, if I restarted now. If you'd be interested to support or advise, please let me know.

I will come back to this in a couple of weeks (will be in holidays :)) and I am looking forward to your improvement and example. Maybe we could link that from the documentation as well. Also, I would like to invite you to the user-meeting we are planning (https://tespy.readthedocs.io/en/main/regular_meeting.html).

Best regards

Francesco

Hi Francesco, in fact I finally got the parallelism working (by using thread local models). I described the remaining issue here https://github.com/dietmarwo/fast-cma-es/blob/master/tutorials/PowerPlant.adoc Sometimes the model gets corrupted - even single threaded. I get nan as return value and after that strange efficiencies which are impossible high. Can be mitigated by recreating the model after receiving nan. So finally I think it is a minor issue now, which nevertheless should be resolved.
Refactoring would be nice, but as parallelism works now it is probably not at the highest prio. I will try to participate at the meeting, but I am not really a user. I maintain fcmaes, a kind of "competitor" to pygmo. Dario (the owner of pygmo) and me probably disagree what works better :-). I am mostly interested in having real world application examples for fcmaes, so may be we can cooperate in producing a more complex TESPy optimization example, may be involving more constraints and multiple objectives. Since paralellism works I don't see an issue in the performance of TESPy.

oemof / tespy

multithreading / multiprocessing for optimization #343