mila-iqia / fuel

A data pipeline framework for machine learning
MIT License
867 stars 268 forks source link

Allow data server to use divide-and-conquer #93

Open bartvm opened 9 years ago

bartvm commented 9 years ago

@yaoli was interested in using a divide-and-conquer approach to preprocessing, as is used in in @dwf's ImageNet PR (https://github.com/bartvm/fuel/pull/68). With that code, I think it should be relatively easy to update the preprocessing server to (optionally) use multiple workers as well.

yaoli commented 9 years ago

You guys are doing something really fancy, way beyond my league to grasp.

On Thu, Apr 30, 2015 at 3:54 PM, Bart van Merriënboer < notifications@github.com> wrote:

@yaoli https://github.com/yaoli was interested in using a divide-and-conquer approach to preprocessing, as is used in in @dwf https://github.com/dwf's ImageNet PR (#68 https://github.com/bartvm/fuel/pull/68). With that code, I think it should be relatively easy to update the preprocessing server to (optionally) use multiple workers as well.

— Reply to this email directly or view it on GitHub https://github.com/bartvm/fuel/issues/93.

dwf commented 9 years ago

If we were to use it in that context we'd probably have to harden it a little bit using request/reply on ROUTERs and DEALERs and bears (oh my). The current setup has a race condition which matters in general but doesn't matter for beefy workers that actually take a solid amount of time.

dwf commented 9 years ago

I'm tossing the relevant code onto https://github.com/dwf/fuel/tree/imagenet_old as I'm moving forward without it, doing something much simpler that doesn't need it.