modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.74k stars 651 forks source link

AttributeError: type object 'Series' has no attribute '__array_prepare__' #885

Closed Anylee2142 closed 4 years ago

Anylee2142 commented 4 years ago

System information

Describe the problem

I encountered this issue while testing my code. The code contains pd.Series().value_counts(), which eventually directs context to __array_prepare__ of modin's pd.Series. It looks it uses the same method from vanilla pandas but I couldn't find the corresponding one. Meanwhile, fixing to

    def __array_prepare__(self, result, context=None):  # pragma: no cover
        return result

solves the problem, not sure it is proper fix though.
May I ask what have you intended by __array_prepare__ to fix this issue?

Source code / logs

Ran 1 test in 0.044s

FAILED (errors=1)

Error Traceback (most recent call last): File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor yield File "/usr/lib/python3.6/unittest/case.py", line 605, in run testMethod() File "/home/ej/github/MDLP/discretization/test/test_mdlp.py", line 23, in test_customentropy self.assertGreaterEqual(ent(freq, base=2), entropy(P, base=2)) File "/home/ej/github/MDLP/discretization/mdlp.py", line 19, in ent return -np.sum(log(v=pi, base=base) * pi) File "/home/ej/github/MDLP/discretization/mdlp.py", line 7, in log_ return np.log(v) / np.log(base) File "/home/ej/github/MDLP/venv/lib/python3.6/site-packages/modin/pandas/series.py", line 135, in array_prepare pandas.Series.array_prepare, result, context=context AttributeError: type object 'Series' has no attribute '__array_prepare__'

Anylee2142 commented 4 years ago

This actually looks similar to numpy's __array_wrap__ and __array_prepare__. Fixing so will do?

devin-petersohn commented 4 years ago

Hi @Anylee2142, thanks for posting!

In pandas, it looks like __array_prepare__ was dropped in 0.25.X and we haven't dropped it yet. When I ran your example locally, I did not encounter the issue on value_counts, and looking at the Traceback you provided, the error is happening on the self.assertGreaterEqual(ent(freq, base=2), entropy(P, base=2)) line. This is because of the ent function which inevitably calls np.log on the Modin Series. When np.<something> gets called on a Modin Series, it goes through this code path to __array_prepare__. You can try this with np.log(freq)

You have posted a valid temporary workaround for this provided there's no extra need for the object to be a Series

It should be removed anyway, I will have to take a look to see what replaces it in the pandas codebase or if it can just be dropped. Thanks again for posting!