jeffgortmaker / pyblp

BLP Demand Estimation with Python
https://pyblp.readthedocs.io
MIT License
240 stars 83 forks source link

Question about parallel in pyblp #118

Closed chaowangcw closed 2 years ago

chaowangcw commented 2 years ago

I tried to use the parallel method, and I use four process for problem.solve. It turns out that it took me 4 times of the computation time than directly running the simple nested logit problem. I am not sure which part makes it slower than directly running the code. Here is the part under the with part:

optimizer = pyblp.Optimization('bfgs', {'gtol': 1e-5})
fixedpoint_algo = pyblp.Iteration('squarem', {'atol': 1e-8})
rho_initial_uniform = 0.8
with pyblp.parallel(4):
        `results = problem.solve(rho=rho_initial_uniform, 
                                            rho_bounds=[0,1], 
                                            method='1s', check_optimality= "gradient",
                                            optimization= optimizer,
                                            iteration= fixedpoint_algo)
chaowangcw commented 2 years ago

Thank for your awesome project!

jeffgortmaker commented 2 years ago

Here's the key text from the docs:

Importantly, multiprocessing will only improve speed if gains from parallelization outweigh overhead from serializing and passing data between processes. For example, if computation for a single market is very fast and there is a lot of data in each market that must be serialized and passed between processes, using multiprocessing may reduce overall speed.

My guess is that for you, during estimation, computation for a single market is fairly fast, so the overhead from passing market-specific data to the different processes dominates, and you actually get a slowdown.

Parallelization in this way will typically only speed things up if computation for a single market is very slow, for example if you have a lot of random coefficients and hence a high dimensional integral to approximate, or if the contraction takes a long time. Since you're working with a nested logit model where your don't have a numerical integral and don't have a contraction, market computation will be fast.

chaowangcw commented 2 years ago

Got it. Thank you very much!