open-ideas / StROBe

Python module for stochastic residential occupancy behavior for both building and district energy simulations.
38 stars 15 forks source link

Large amount of time spent on cPickle.load() #15

Open bramvdh91 opened 6 years ago

bramvdh91 commented 6 years ago

Profiling the example.py shows that a lot of time is spent on loading a cPickle object while compiling the output .txt files for a feeder. Usually the user is not interested in the pickle files, which in addition take a lot of space.

I would like to try if the code can be sped up by only directly writing the variables of interest.

image

cprotopa commented 6 years ago

One problem is that it re-loads the files for every variable. Calling the output function only once for all variables would already reduce the time. I have such a version for myself, I once applied this in a local version. I could look for it, but perhaps you can easily implement it. I'm not sure if there are conflicts with other parts of the code though, in case you can somewhere ask which variables you actually want as output.

xavfa commented 4 years ago

Hi, I am not sure to understand, I have splited the output function (in feeder) into 2 : one for the data aggregation that saves into a self.dat (into the fee object) this one is called in place of hou.pickle() and another one that writes directly the ascii files called in place of output(). I didn't found elsewhere some pickle use. for the 5 building in the example it saves less than 4 second in my laptop (32Gb RAM). the overall process takes 86 seconds with the above modifications. I guess these modifications are the same as the one you mentioned above but as being part of the fee object i am not sure that there are impacts elsewhere.

cprotopa commented 4 years ago

I'm not sure what you mean exactly, but here is how the feeder model currently works: all houses are generated and saved as pickled files. Afterwards, all houses are loaded one by one, and the output files required for Modelica IDEAS simulations are created (one file per variable, each file has one column of data per house). Previously, the pickled files were reloaded separately per variable, so this has been already fixed.

Skipping the pickling entirely is possible, in the way you describe it. So each house that is created is not saved in a file, but saved in a variable. Each new house that is simulated is added to the variable, and in the end you just write everything. I have experimented with that, and it is indeed faster, for a small number of houses. If you don't care to save all results of individual houses, then it's a good approach. HOWEVER, you risk to run into memory problems as soon as you go to feeders above ~150-200 houses (depends on your PC, of course). Since it doesn't scale up well with size, we haven't implemented it, so that nobody risks to get issues.