wellcometrust / reach

Wellcome tool to parse references scraped from policy documents using machine learning
MIT License
25 stars 4 forks source link

Investigate paralysation of yielding structured references #507

Open lizgzil opened 4 years ago

lizgzil commented 4 years ago

In split_reach/extracter/extract_refs_task.py we set pool_map = map for use in yield_structured_references. However if we utilise Pool from multiprocessing i.e.

pool = Pool(num_workers)
pool_map = pool.map

we could speed up this task. However in the past we found using num_workers>1 actually slowed things down. So worth investigating how this behaves now to see whether it's worth implementing.