vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.3k stars 591 forks source link

[BUG-REPORT] Dataframes with no columns raise errors for various operations #2094

Open honno opened 2 years ago

honno commented 2 years ago

I'm able to create dataframes with zero columns, but representing it produces the following

>>> import vaex
>>> df = vaex.from_dict({})
>>> df
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../vaex/packages/vaex-core/vaex/dataframe.py", line 4221, in __repr__
    return self._head_and_tail_table(format='plain')
  File ".../vaex/packages/vaex-core/vaex/dataframe.py", line 3961, in _head_and_tail_table
    if N <= n:
TypeError: '<=' not supported between instances of 'NoneType' and 'int'

I'm not too familiar with Vaex, but I imagine these type of bugs which assume at least 1 column will pop up for various operations, e.g. df.concat(df) raises... although maybe that's a nonsensical in the first place (pandas.concat([pd.DataFrame({}), pd.DataFrame({})]) works interestingly).

Also, such dataframes cannot interop with https://github.com/pandas-dev/pandas/pull/46141

>>> from pandas.api.exchange import from_dataframe
>>> from_dataframe(df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../pandas/core/exchange/from_dataframe.py", line 57, in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy))
  File ".../pandas/core/exchange/from_dataframe.py", line 77, in _from_dataframe
    for chunk in df.get_chunks():
  File ".../vaex/packages/vaex-core/vaex/dataframe_protocol.py", line 750, in get_chunks
    n_chunks = n_chunks if n_chunks is not None else self.num_chunks()
  File ".../vaex/packages/vaex-core/vaex/dataframe_protocol.py", line 712, in num_chunks
    if isinstance(self.get_column(0)._col.values, pa.ChunkedArray):
  File ".../vaex/packages/vaex-core/vaex/dataframe_protocol.py", line 721, in get_column
    return _VaexColumn(self._df[:, i], allow_copy=self._allow_copy)
  File ".../vaex/packages/vaex-core/vaex/dataframe.py", line 5355, in __getitem__
    df = df[item[0]]
  File ".../vaex/packages/vaex-core/vaex/dataframe.py", line 5371, in __getitem__
    stop = stop or len(self)
TypeError: 'NoneType' object cannot be interpreted as an integer

I searched around and couldn't figure out if such dataframes are even supported by Vaex in the first place... I have no use case for them myself heh, it's just such dataframes are valid for other dataframe libraries (like pandas). If they're not supported, possibly constructors should raise ValueError if a zero-col dataframe is trying to be initialized.

Vaex was built locally from source (upstream master) on Ubuntu 20.04.

cvanelteren commented 4 months ago

Same issue!