inspirehep / beard

Bibliographic Entity Automatic Recognition and Disambiguation
Other
66 stars 36 forks source link

blocking: multiprocessing #84

Closed jacenkow closed 8 years ago

jacenkow commented 8 years ago

Signed-off-by: Grzegorz Jacenków grzegorz.jacenkow@cern.ch

jacenkow commented 8 years ago

@glouppe

MSusik commented 8 years ago

Obviously, some code could be factored out of _single_fit and _parallel_fit, but that is a minor issue

So, what about the approach that Gilles suggested earlier?

if self.n_jobs == 1:
    from Queue import Queue
    data_queue = Queue()
    result_queue = Queue()
    _parallel_fit(...)
else:
    try:
        from multiprocessing import SimpleQueue
    except ImportError:
        from multiprocessing.queues import SimpleQueue
    data_queue = SimpleQueue()
    result_queue = SimpleQueue()
    # The rest of the code
# Even more code

I think Queue should not copy the data, only the reference, so memory consumption shouldn't be a problem.

jacenkow commented 8 years ago

@MSusik _parallel_fit gets an empty queue (with no data), thus goes into an infinite loop.

MSusik commented 8 years ago

@jacenkow You're right. It obviously runs in the same thread :stuck_out_tongue_closed_eyes:

jacenkow commented 8 years ago

@MSusik haha I have done exactly in the same way as you did, and to be honest all the credits for spotting the problem goes to Gilles :dancer:

MSusik commented 8 years ago

Sidenote: You still should be able to implement this solution by using Python threading which doesn't spawn any processes, though.

EDIT: but this will copy the data, so probably it's not worth implementing!