vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.29k stars 590 forks source link

vaex.from_pandas fails with "AttributeError: module 'pandas.core.arrays' has no attribute 'integer'" #766

Closed chaltik closed 4 years ago

chaltik commented 4 years ago

`(py3.6-tsse) wtisim@ip-10-0-0-38:~/wtisim$ ipython Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 17:14:51) Type 'copyright', 'credits' or 'license' for more information IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import vaex

In [2]: import pandas as pd

In [3]: import numpy as np

In [4]: test_df = pd.DataFrame(np.random.randn(10000000,10))

In [5]: vaex_df = vaex.from_pandas(test_df)

AttributeError Traceback (most recent call last)

in () ----> 1 vaex_df = vaex.from_pandas(test_df) ~/wtisim/.venv/py3.6-tsse/lib/python3.6/site-packages/vaex/__init__.py in from_pandas(df, name, copy_index, index_name) 400 print("Giving up column %s, error: %r" % (name, e)) 401 for name in df.columns: --> 402 add(name, df[name]) 403 if copy_index: 404 add(index_name, df.index) ~/wtisim/.venv/py3.6-tsse/lib/python3.6/site-packages/vaex/__init__.py in add(name, column) 388 def add(name, column): 389 values = column.values --> 390 if isinstance(values, pd.core.arrays.integer.IntegerArray): 391 values = np.ma.array(values._data, mask=values._mask) 392 try: AttributeError: module 'pandas.core.arrays' has no attribute 'integer' In [7]: pd.__version__ Out[7]: '0.23.3' In [8]: np.__version__ Out[8]: '1.18.4' In [9]: vaex.__version__ Out[9]: {'vaex': '3.0.0', 'vaex-core': '2.0.0', 'vaex-viz': '0.4.0', 'vaex-hdf5': '0.6.0', 'vaex-server': '0.3.0', 'vaex-astro': '0.7.0', 'vaex-jupyter': '0.5.0', 'vaex-ml': '0.9.0', 'vaex-arrow': '0.5.0'} `
JovanVeljanoski commented 4 years ago

Hi,

This is actually the same issue as in #608

So in your example, if you "name" the column in the pandas dataframe, things should go smoothly:

test_df = pd.DataFrame(np.random.randn(10000000,10), columns=[f'col{i}' for i in range(10)])
df = vaex.from_pandas(test_df)
df
chaltik commented 4 years ago

thanks. this came out trying to read a 2d numpy array into vaex and not finding a direct way to do it :) (from_arrays only take 1d arrays)

itoledoc commented 4 years ago

Hi,

This is actually the same issue as in #608

So in your example, if you "name" the column in the pandas dataframe, things should go smoothly:

test_df = pd.DataFrame(np.random.randn(10000000,10), columns=[f'col{i}' for i in range(10)])
df = vaex.from_pandas(test_df)
df

Hi,

I think this is related to the version of pandas being used. I'm having the same issue within a project were I'm forced to use pandas version 0.23.4, and the solution you propose doesn't work neither, and fails with the same error. I execute exactly the code you use as example and I still get the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-55-cf6ccc83be51> in <module>
      1 test_df = pd.DataFrame(np.random.randn(10000000,10), columns=[f'col{i}' for i in range(10)])
----> 2 df = vaex.from_pandas(test_df)
      3 df

/data/dataiku/dss_data/code-envs/python/py36_weather/lib/python3.6/site-packages/vaex/__init__.py in from_pandas(df, name, copy_index, index_name)
    400                 print("Giving up column %s, error: %r" % (name, e))
    401     for name in df.columns:
--> 402         add(name, df[name])
    403     if copy_index:
    404         add(index_name, df.index)

/data/dataiku/dss_data/code-envs/python/py36_weather/lib/python3.6/site-packages/vaex/__init__.py in add(name, column)
    388     def add(name, column):
    389         values = column.values
--> 390         if isinstance(values, pd.core.arrays.integer.IntegerArray):
    391             values = np.ma.array(values._data, mask=values._mask)
    392         try:

AttributeError: module 'pandas.core.arrays' has no attribute 'integer'

And in fact pandas 0.23.4 module 'pandas.core.arrays' doesn't have the attribute integer yet implemented.

maartenbreddels commented 4 years ago

Yeah, I think we actually require pandas 0.24. We should change that in our requirements or fix this.

I think a workaround for now would be to copy/paste https://github.com/vaexio/vaex/blob/d7c32e046dd3da6eaf773221bd74bdeed2127ab2/packages/vaex-core/vaex/__init__.py#L371

and take out https://github.com/vaexio/vaex/blob/d7c32e046dd3da6eaf773221bd74bdeed2127ab2/packages/vaex-core/vaex/__init__.py#L390