Open ddale opened 9 years ago
I don't have a windows machine to test on, so I defer to Darren on all things windows (insofar as the performance on Linux and Mac is not impacted!).
Sent from my iPhone
On Nov 25, 2014, at 7:26 AM, Darren Dale notifications@github.com wrote:
Multiprocessing of paintGrid in indexer.py currently includes a workaround that writes pickle files for Processes on Windows to load state. This approach was used to provide a relatively quick fix while still using multiprocessing.Pool.map. However, this implementation produces some undesirable behavior on Windows:
Spinning up the workers is IO bound, as all workers attempt to load state from the pickle file.
Spinning up the workers results in a huge spike in memory consumption, thought to be caused by the unpickling process.
An alternative implementation already exists in hexrd.fitgrains. The approach works as follows:
Create a Worker class that implements the multiprocessing.Process interface, but is not a subclass of that class. This will be used when multiprocessing is disabled, for example during profiling. The worker exits when the queue is empty.
Create a WorkerMP class that subclasses Worker and multiprocessing.Process. Basically all this class needs to do is call 'Process.init' to enable multiprocessing.
Create a multiprocessing.JoinableQueue and populate it with the job-specific information, which tends to be very small (for example, a single quaternion).
Pack all of the contextual data into a dictionary, to be passed to the individual workers during instantiation.
Create a multiprocessing.Manager.List to hold the results
Start the multiprocessing workers sequentially:
for i in range(n_cpus): w = Worker(queue, results, params) w.start() Each worker begins processing immediately, its possible processing may even complete before all workers have been spun up.
Wait until the results list is complete, updating progress bars based on its length.
Improvements to be made:
Refactor this multiprocessing approach into a separate module containing abstract base classes to avoid code duplication.
Implement a custom map function that is called with a Worker class (not an instance), the contextual information, number of cpus, a list of data to iterate over, and a progress callback as input. It creates a queue to pass to the workers, creates a managed list to hold the results, spins up the workers sequentially and begins processing, and then enters a loop to report progress until processing is complete. The function returns the list of results.
Consider breaking this into a custom Pool class with a map method. This implementation is cleaner, Pool would be instantiated by passing the Worker class, the contextual data dict, and the number of cpus, map would be called with the list of data over which to iterate, and the callback. The problem is that for smaller datasets, much of the processing time appears to be consumed by spinning up the Workers themselves, so we want each worker to begin processing immediately, not wait until the entire pool is ready. We should time the initialization step though, perhaps it is not such a big issue.
Convert paintGrid multiprocessing to use this approach.
Refactor fitgrains multiprocessing to use this new approach.
— Reply to this email directly or view it on GitHub.
An alternative implementation for multiprocessing of paintGrid exists in
hexrd.fitgrains
. The approach works as follows:Worker
class that implements themultiprocessing.Process
interface, but is not a subclass of that class. This will be used when multiprocessing is disabled, for example during profiling. The worker exits when the queue is empty.WorkerMP
class that subclassesWorker
andmultiprocessing.Process
. Basically all this class needs to do is call 'Process.init' to enable multiprocessing.multiprocessing.JoinableQueue
and populate it with the job-specific information, which tends to be very small (for example, a single quaternion).Start the multiprocessing workers sequentially:
Each worker begins processing immediately, its possible processing may even complete before all workers have been spun up.
Improvements to be made:
Worker
class (not an instance), the contextual information, number of cpus, a list of data to iterate over, and a progress callback as input. It creates a queue to pass to the workers, creates a managed list to hold the results, spins up the workers sequentially and begins processing, and then enters a loop to report progress until processing is complete. The function returns the list of results.Pool
class with amap
method. This implementation is cleaner, Pool would be instantiated by passing theWorker
class, the contextual data dict, and the number of cpus, map would be called with the list of data over which to iterate, and the callback. The problem is that for smaller datasets, much of the processing time appears to be consumed by spinning up the Workers themselves, so we want each worker to begin processing immediately, not wait until the entire pool is ready. We should time the initialization step though, perhaps it is not such a big issue.