mggg / GerryChain

Use MCMC to analyze districting plans and gerrymanders
https://mggg.github.io/GerryChain/
Other
132 stars 74 forks source link

GerryChain multiprocessing #333

Closed pucuz closed 1 year ago

pucuz commented 4 years ago

I'm wondering if there is any opportunity/possibility to add multiprocessing to GerryChain?

msarahan commented 4 years ago

I'm not sure there's much opportunity, but I'm also not really an expert on MCMC, nor on this project. The chain itself is inherently stateful and serial. Each step in the chain must satisfy a number of constraints - this may be an opportunity for parallelism. The evaluation of each constraint could be a separate process, or perhaps the fitness of many possible steps could be evaluated in parallel, with the first one to complete successfully "winning". I don't think you'd get very good scaling, and the process overhead may actually make it slower.

Instead, I think the best scaling is to evaluate different seed partitions (maps) in parallel. This will scale perfectly, as long as you have enough partitions to keep your cores busy.

Additionally, GerryChain could be sped up by compiling the loop-heavy code. This could be done in many ways, including cython, numba, or more dramatic rewrites in C/C++/rust/etc. with python interfaces.

EDIT: This is a gut-feel evaluation, and I have not done any profiling. Take it with a huge grain of salt.

pucuz commented 4 years ago

@msarahan Thanks for the recommendations. I agree that running different seed plans in parallel is probably the low hanging fruit. I'll also look into compiling some of my code to see if that helps.

I think it would be good for the code maintainers to look into possibilities to multiprocess where possible since this is such process heavy code.

msarahan commented 4 years ago

Just keep in mind that in Python, you pay a very heavy price for multiprocessing: each process has its own memory space, and you have to serialize/deserialize to communicate across processes. There are certainly more efficient ways to do that, and many or all of the compilation options will have better ways to use threads instead of processes to share memory space and use compute more efficiently (because they can work around the GIL). Naive "multiprocess" in Python is almost certainly a dead-end. I hope this saves you some time.

NitinBhasneria commented 4 years ago

@msarahan what will be other things then multiprocessing and multithreading you want in parallelization.

msarahan commented 4 years ago

Don't parallelize anything until you are sure that is the right way to improve performance. Use a profiler to identify slow sections. https://github.com/rkern/line_profiler has the nicest info, but I usually use https://pypi.org/project/pytest-profiling/ with my test suite to profile specific functionality.

Algorithmic improvements and static typing/elimination of boundary checks (both of which come from a python code compiler, like numba or cython) generally yield much greater gains than simple parallelism, and often with less work than trying to do fine-grained parallelism of particular sections in python.

NitinBhasneria commented 4 years ago

@msarahan thanks for your guidance. I will surely try this thanks.

NitinBhasneria commented 4 years ago

One more thing, how do you think we should start with the code and divide the work of implementing this things.@msarahan

pucuz commented 4 years ago

I guess I don't really see how I can improve things in my script. It's looping through the provisions of the chain that takes a lot of time. My script is gathering a lot of information within the loop into dictionaries and lists which are written out to CSV files after the chain loop completes. Any suggestions for best practices for doing that sort of thing would be helpful.

InnovativeInventor commented 2 years ago

Parallelization is totally possible (and one can see pretty big improvements via parallelization of ReCom). However, I don't think it's feasible (or easy) to do in Python without incurring a massive overhead that negates the performance gains. See: https://github.com/pjrule/frcw.rs. It's wicked fast (~5s for 100k steps)! So, parallelization is totally possible. I also have a wait-free entirely equivalent (i.e. samples from the same distribution) algorithm for ReCom that is in the works (frcw has some locking).

We're working on getting interoperability (via pcompress) between GerryChain Julia, GerryChain.NET, and frcw (work in progress). You can already (somewhat experimental and not ergonomic at all) interop between GerryChain Julia and Python. I believe @jenni-niels is putting the finishing touches on interop between GerryChain.NET and GerryChain Python.

pjrule commented 1 year ago

As @InnovativeInventor notes above, we've determined that multithreaded chains/optimizers are best reserved for GerryChain's high-performance siblings going forward. Thanks for the discussion!