uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.38k stars 89 forks source link

Different results from map and imap #165

Closed Stack-Attack closed 2 years ago

Stack-Attack commented 5 years ago

Hello,

I'm having a very strange issue on a project i'm working on which involves training a bunch of SKLearn classifiers on a per pixel basis. My implementation using imap works as expected, but the results using map are completely off. I'm really not sure where the issue is coming, as I can't replicate the results without using a real SKLearn model.

y and X are 2d numpy arrays.

def per_pixel_train(y, X, model, index, dummy = DummyClassifier()):
        try:
            result = model.fit(X,y)
        except ValueError:
            result = dummy.fit(X,y)
        return([index, model])

self.classifiers = pool.map(per_pixel_train, y, repeat(X), repeat(DecisionTreeClassifier()), range(len(y)))

self.classifiers = list(pool.imap(per_pixel_train, y, repeat(X), repeat(DecisionTreeClassifier()), range(len(y))))

The resulting image from the map method is completely wrong, whereas imap works fine.

mmckerns commented 5 years ago

@Stack-Attack: It's difficult to diagnose without having a minimal working example. It's recommended that you see if you can reproduce the issue with, say, numpy arrays only, or a simple sklearn model.

If I had to guess... there is an known multiprocessing "issue" that has to do with random numbers and numpy arrays. Basically, if you don't pass a different seed to each of the different processes in a map, each process is seeded with the same value... hence the several values of the numpy array all pick the same random value. It's possible that this the root of your issue.

mmckerns commented 2 years ago

I'm going to assume your issue is now resolved. if not, please reopen and continue this thread.