markfairbanks / tidypolars

Tidy interface to polars
http://tidypolars.readthedocs.io
MIT License
321 stars 10 forks source link

`drop` with error `RuntimeError: Any(NotFound("^x.*$"))` #141

Closed ztsweet closed 2 years ago

ztsweet commented 2 years ago
import sys
import tidypolars as tp
sys.version
# '3.9.7 (default, Sep 16 2021, 13:09:58) \n[GCC 7.5.0]'
tp.__version__
# '0.2.1'
## error
df = tp.Tibble(x1 = range(3), x2 = range(3), y=range(3), z = range(3))
df.drop([tp.starts_with('x'), 'z'])
df.drop()
`
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_12815/866601321.py in <module>
----> 1 df.drop(tp.starts_with('x'))

~/miniconda3/envs/py39/lib/python3.9/site-packages/polars/eager/frame.py in drop(self, name)
   2253             return df
   2254 
-> 2255         return wrap_df(self._df.drop(name))
   2256 
   2257     def drop_in_place(self, name: str) -> "pl.Series":

RuntimeError: Any(NotFound("^x.*$"))
`
markfairbanks commented 2 years ago

This works for me as well.

import tidypolars as tp

df = tp.Tibble(x1 = range(3), x2 = range(3), y=range(3), z = range(3))

df.drop([tp.starts_with('x'), 'z'])

┌─────┐
│ y   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
├╌╌╌╌╌┤
│ 1   │
├╌╌╌╌╌┤
│ 2   │
└─────┘

@mjkarlsen - Does this work for you?

mjkarlsen commented 2 years ago

@markfairbanks - here is what is interesting to me...

It works when I use the tidypolars development environment and it works when I create a new environment with a fresh pip install of tidypolars.

Success in Dev Env

import tidypolars as tp
import sys 
print(sys.version)

df = tp.Tibble(x1 = range(3), x2 = range(3), y = range(3), z = range(3))
print(df.drop([tp.starts_with('x'), 'z']))

3.9.7 (default, Sep 16 2021, 13:09:58) 
[GCC 7.5.0]
shape: (3, 1)
┌─────┐
│ y   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
├╌╌╌╌╌┤
│ 1   │
├╌╌╌╌╌┤
│ 2   │
└─────┘

Success in New Env

import tidypolars as tp
import sys
print(sys.version)

df = tp.Tibble(x1 = range(3), x2 = range(3), y = range(3), z = range(3))
df.drop([tp.starts_with('x'), 'z'])

3.9.7 (default, Sep 16 2021, 13:09:58) 
[GCC 7.5.0]
shape: (3, 1)
┌─────┐
│ y   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
├╌╌╌╌╌┤
│ 1   │
├╌╌╌╌╌┤
│ 2   │
└─────┘
markfairbanks commented 2 years ago

@ztsweet - a few questions

markfairbanks commented 2 years ago

@ztsweet - I'm going to close this for now. We can keep trouble shooting here and see if we can get it to work. I think you might just need to start with a clean environment.

ztsweet commented 2 years ago

new environment

pip list
`
Package           Version
----------------- ---------
backcall          0.2.0
certifi           2021.10.8
datatable         1.0.0
debugpy           1.5.1
decorator         5.1.0
entrypoints       0.3
greenlet          1.1.2
ipykernel         6.5.0
ipython           7.28.0
jedi              0.18.0
jupyter-client    7.0.6
jupyter-core      4.9.1
matplotlib-inline 0.1.3
nest-asyncio      1.5.1
numpy             1.21.2
pandas            1.3.3
parso             0.8.2
pexpect           4.8.0
pickleshare       0.7.5
pip               21.2.4
polars            0.10.20
prompt-toolkit    3.0.20
ptyprocess        0.7.0
pyarrow           6.0.0
Pygments          2.10.0
PyMySQL           1.0.2
python-dateutil   2.8.2
pytz              2021.3
pyzmq             22.3.0
setuptools        58.0.4
six               1.16.0
SQLAlchemy        1.4.25
tidypolars        0.2.1
tornado           6.1
traitlets         5.1.0
wcwidth           0.2.5
wheel             0.37.0
`

system version

` Linux bi-test 5.4.0-62-generic #70~18.04.1-Ubuntu SMP Tue Jan 12 17:18:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

`

wWhen I create two variables with the same name, the error occur

import tidypolars as tp
##
df = tp.Tibble(x = range(3),x1=range(3), y = range(3, 6), z = ['a', 'a', 'b'])
type(df)
`
tidypolars.tidypolars.Tibble
`
df.drop([tp.starts_with('x'), 'z']) # normal running
##
df = tp.Tibble(x1 = range(3), x2 = range(3), y=range(3), z = range(3))
type(df)
`
polars.eager.frame.DataFrame
`
df.drop([tp.starts_with('x'), 'z'])
`
...
RuntimeError: Any(NotFound("^x.*$"))

`

i love the 'tidy-mode' way to processing big data @markfairbanks

markfairbanks commented 2 years ago

@ztsweet can you test this one more time with tidypolars v0.2.4?

I was able to reproduce the error, and I think I have it fixed now.

ztsweet commented 2 years ago

It's my pleasure @markfairbanks

ztsweet commented 2 years ago

@markfairbanks run the code with tidypolars v0.2.4 ,the error did not occur. when run code in jupyterlab occur another error AttributeError: height not found and run code in ipython did not occur, I don't know what the reason is.

ipython verson and jupyter verson

` ipykernel 6.5.0 ipython 7.29.0 jupyter-client 7.0.6 jupyter-core 4.9.1

`

The code works perfectly in ipython

import tidypolars as tp
df = tp.Tibble(x = range(3),x1=range(3), y = range(3, 6), z = ['a', 'a', 'b'])
df.drop([tp.starts_with('x'), 'z'])
df = tp.Tibble(x1 = range(3), x2 = range(3), y=range(3), z = range(3))
df.drop([tp.starts_with('x'), 'z'])

The code's result works correct in jupytelab ,but have another error AttributeError: height not found

import tidypolars as tp
df = tp.Tibble(x = range(3),x1=range(3), y = range(3, 6), z = ['a', 'a', 'b'])  # normal running
print(df) # normal running
df.head() # have extra error
`
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/miniconda3/envs/tidy39/lib/python3.9/site-packages/polars/eager/frame.py in __getattr__(self, item)
    906         try:
--> 907             return pl.eager.series.wrap_s(self._df.column(item))
    908         except RuntimeError:

RuntimeError: Any(NotFound("height"))

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
~/miniconda3/envs/tidy39/lib/python3.9/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/miniconda3/envs/tidy39/lib/python3.9/site-packages/polars/eager/frame.py in _repr_html_(self)
   1166         max_cols = int(os.environ.get("POLARS_FMT_MAX_COLS", default=75))
   1167         max_rows = int(os.environ.get("POLARS_FMT_MAX_ROWS", default=25))
-> 1168         return "\n".join(NotebookFormatter(self, max_cols, max_rows).render())
   1169 
   1170     def to_series(self, index: int = 0) -> "pl.Series":

~/miniconda3/envs/tidy39/lib/python3.9/site-packages/polars/_html.py in __init__(self, df, max_cols, max_rows)
     48         self.row_idx: Iterable[int]
     49         self.col_idx: Iterable[int]
---> 50         if max_rows < df.height:
     51             self.row_idx = (
     52                 list(range(0, max_rows // 2))

~/miniconda3/envs/tidy39/lib/python3.9/site-packages/polars/eager/frame.py in __getattr__(self, item)
    907             return pl.eager.series.wrap_s(self._df.column(item))
    908         except RuntimeError:
--> 909             raise AttributeError(f"{item} not found")
    910 
    911     def __iter__(self) -> Iterator[Any]:

AttributeError: height not found
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ x   ┆ x1  ┆ y   ┆ z   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╪═════╡
│ 0   ┆ 0   ┆ 3   ┆ a   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1   ┆ 1   ┆ 4   ┆ a   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 2   ┆ 5   ┆ b   │
└─────┴─────┴─────┴─────┘

`
df.drop(['x', 'y']) # the same error occur
`
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/miniconda3/envs/tidy39/lib/python3.9/site-packages/polars/eager/frame.py in __getattr__(self, item)
    906         try:
--> 907             return pl.eager.series.wrap_s(self._df.column(item))
    908         except RuntimeError:

RuntimeError: Any(NotFound("height"))

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
~/miniconda3/envs/tidy39/lib/python3.9/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/miniconda3/envs/tidy39/lib/python3.9/site-packages/polars/eager/frame.py in _repr_html_(self)
   1166         max_cols = int(os.environ.get("POLARS_FMT_MAX_COLS", default=75))
   1167         max_rows = int(os.environ.get("POLARS_FMT_MAX_ROWS", default=25))
-> 1168         return "\n".join(NotebookFormatter(self, max_cols, max_rows).render())
   1169 
   1170     def to_series(self, index: int = 0) -> "pl.Series":

~/miniconda3/envs/tidy39/lib/python3.9/site-packages/polars/_html.py in __init__(self, df, max_cols, max_rows)
     48         self.row_idx: Iterable[int]
     49         self.col_idx: Iterable[int]
---> 50         if max_rows < df.height:
     51             self.row_idx = (
     52                 list(range(0, max_rows // 2))

~/miniconda3/envs/tidy39/lib/python3.9/site-packages/polars/eager/frame.py in __getattr__(self, item)
    907             return pl.eager.series.wrap_s(self._df.column(item))
    908         except RuntimeError:
--> 909             raise AttributeError(f"{item} not found")
    910 
    911     def __iter__(self) -> Iterator[Any]:

AttributeError: height not found
shape: (3, 2)
┌─────┬─────┐
│ x1  ┆ z   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 0   ┆ a   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 1   ┆ a   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ b   │
└─────┴─────┘
`
markfairbanks commented 2 years ago

@ztsweet - Everything should work now in v0.2.5. I needed to define a special print method for use in jupyter.

Thanks for catching all of these! If you run into anything else let me know.