vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.25k stars 590 forks source link

[BUG-REPORT] Column names prefixed with double underscores (`"__"`) don't show in `df.get_column_names()` #2165

Closed honno closed 2 years ago

honno commented 2 years ago
>>> df = vaex.from_items(("__", np.asarray([0])))
>>> df.get_column_names()
[]

Same goes for seemingly any string after the double underscores

>>> df = vaex.from_items(("__foo", np.asarray([0])))
>>> df.get_column_names()
[]

I don't have a use case for such names, just found it whilst testing interchange on #2150.

>>> df = vaex.from_items(("__", np.asarray([0])))
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column(0)
File .../vaex-core/vaex/dataframe_protocol.py:740, in _VaexDataFrame.get_column(self, i)
    739 def get_column(self, i: int) -> _VaexColumn:
--> 740     return _VaexColumn(self._df[:, i], allow_copy=self._allow_copy)
File .../vaex/packages/vaex-core/vaex/dataframe.py:5379, in DataFrame.__getitem__(self, item)
   5377 if len(item) > 1:
   5378     if isinstance(item[1], int):
-> 5379         name = self.get_column_names()[item[1]]
   5380         return df[name]
   5381     elif isinstance(item[1], slice):
IndexError: list index out of range

vaex was built locally from source (upstream master) on Ubuntu 20.04.

JovanVeljanoski commented 2 years ago

Columns starting with "__" have special meaning in vaex and are hidden. Users should not be creating such columns, unless they really really know what they are doing.

You can easily get access to them via

df = vaex.from_items(("__foo", np.asarray([0])))
df.get_column_names(hidden=True)
['__foo']