holoviz / panel

Panel: The powerful data exploration & web app framework for Python
https://panel.holoviz.org
BSD 3-Clause "New" or "Revised" License
4.58k stars 499 forks source link

Regression in Tabulator.selected_dataframe for panel>0.12.1 #2907

Closed grelston closed 3 weeks ago

grelston commented 2 years ago

See my Tabulator-regression repo for the MRE, conda environments, and animated gif screenshots.

ALL software version info

OS:

conda environment:

```yml name: panel-0.12.4 channels: - pyviz - conda-forge - defaults dependencies: - argon2-cffi=21.1.0 - async_generator=1.10 - attrs=21.2.0 - backcall=0.2.0 - backports=1.0 - backports.functools_lru_cache=1.6.4 - bleach=4.1.0 - bokeh=2.4.1 - brotlipy=0.7.0 - ca-certificates=2021.10.8 - certifi=2021.10.8 - cffi=1.15.0 - chardet=4.0.0 - colorama=0.4.4 - cryptography=35.0.0 - debugpy=1.5.1 - decorator=5.1.0 - defusedxml=0.7.1 - entrypoints=0.3 - et_xmlfile=1.0.1 - freetype=2.10.4 - idna=2.10 - importlib-metadata=4.8.1 - importlib_resources=5.4.0 - intel-openmp=2021.4.0 - ipykernel=6.4.2 - ipython=7.29.0 - ipython_genutils=0.2.0 - jbig=2.1 - jedi=0.18.0 - jinja2=3.0.2 - jpeg=9d - jsonschema=4.2.1 - jupyter_client=7.0.6 - jupyter_core=4.9.1 - jupyterlab_pygments=0.1.2 - lcms2=2.12 - lerc=3.0 - libblas=3.9.0 - libcblas=3.9.0 - libdeflate=1.8 - liblapack=3.9.0 - libpng=1.6.37 - libsodium=1.0.18 - libtiff=4.3.0 - libzlib=1.2.11 - lz4-c=1.9.3 - markdown=3.3.4 - markupsafe=2.0.1 - matplotlib-inline=0.1.3 - mistune=0.8.4 - mkl=2021.4.0 - nbclient=0.5.4 - nbconvert=6.2.0 - nbformat=5.1.3 - nest-asyncio=1.5.1 - notebook=6.4.5 - numpy=1.21.4 - olefile=0.46 - openjpeg=2.4.0 - openpyxl=3.0.9 - openssl=1.1.1l - packaging=21.0 - pandas=1.3.4 - pandoc=2.16.1 - pandocfilters=1.5.0 - panel=0.12.4 - param=1.12.0 - parso=0.8.2 - pickleshare=0.7.5 - pillow=8.3.2 - pip=21.3.1 - prometheus_client=0.12.0 - prompt-toolkit=3.0.22 - pycparser=2.21 - pyct=0.4.8 - pyct-core=0.4.8 - pygments=2.10.0 - pyopenssl=21.0.0 - pyparsing=3.0.5 - pyrsistent=0.18.0 - pysocks=1.7.1 - python=3.9.7 - python-dateutil=2.8.2 - python_abi=3.9 - pytz=2021.3 - pyviz_comms=2.1.0 - pywin32=302 - pywinpty=1.1.5 - pyyaml=6.0 - pyzmq=22.3.0 - requests=2.25.1 - send2trash=1.8.0 - setuptools=58.5.3 - six=1.16.0 - sqlite=3.36.0 - tbb=2021.4.0 - terminado=0.12.1 - testpath=0.5.0 - tk=8.6.11 - tornado=6.1 - tqdm=4.62.3 - traitlets=5.1.1 - typing_extensions=3.10.0.2 - tzdata=2021e - ucrt=10.0.20348.0 - urllib3=1.26.7 - vc=14.2 - vs2015_runtime=14.29.30037 - wcwidth=0.2.5 - webencodings=0.5.1 - wheel=0.37.0 - win_inet_pton=1.1.0 - winpty=0.4.3 - xz=5.2.5 - yaml=0.2.5 - zeromq=4.3.4 - zipp=3.6.0 - zlib=1.2.11 - zstd=1.5.0 prefix: C:\Users\User\.conda\envs\panel-0.12.4 ```

Description of expected behavior and the observed behavior

Expected: When filtering a table and selecting rows from the filtered subset, the Tabulator.selected_dataframe returns those selected rows for panel=0.12.1.

Observed: For panel>=0.12.2 it throws an IndexError.

Complete, minimal, self-contained example code that reproduces the issue

See the Tabulator-regression.ipynb notebook (nbviewer or GitHub), which is based on the example in the Function based filtering section of the Tabulator page of the Reference Gallery.

Stack traceback and/or browser JavaScript console output

```python-traceback --------------------------------------------------------------------------- IndexError Traceback (most recent call last) ~\.conda\envs\panel-0.12.4\lib\site-packages\pandas\core\indexing.py in _get_list_axis(self, key, axis) 1529 try: -> 1530 return self.obj._take_with_is_copy(key, axis=axis) 1531 except IndexError as err: ~\.conda\envs\panel-0.12.4\lib\site-packages\pandas\core\generic.py in _take_with_is_copy(self, indices, axis) 3627 """ -> 3628 result = self.take(indices=indices, axis=axis) 3629 # Maybe set copy if we didn't actually change the index. ~\.conda\envs\panel-0.12.4\lib\site-packages\pandas\core\generic.py in take(self, indices, axis, is_copy, **kwargs) 3614 -> 3615 new_data = self._mgr.take( 3616 indices, axis=self._get_block_manager_axis(axis), verify=True ~\.conda\envs\panel-0.12.4\lib\site-packages\pandas\core\internals\managers.py in take(self, indexer, axis, verify) 861 n = self.shape[axis] --> 862 indexer = maybe_convert_indices(indexer, n, verify=verify) 863 ~\.conda\envs\panel-0.12.4\lib\site-packages\pandas\core\indexers.py in maybe_convert_indices(indices, n, verify) 291 if mask.any(): --> 292 raise IndexError("indices are out-of-bounds") 293 return indices IndexError: indices are out-of-bounds The above exception was the direct cause of the following exception: IndexError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_172632/1318812073.py in 1 # This fails for panel>0.12.1: 2 if demo: ----> 3 fill_tbl2(True) ~\AppData\Local\Temp/ipykernel_172632/1200927601.py in fill_tbl2(event) 4 df_new = ( 5 df2.append( ----> 6 other=tbl.selected_dataframe, 7 ignore_index=True 8 ) ~\.conda\envs\panel-0.12.4\lib\site-packages\panel\widgets\tables.py in selected_dataframe(self) 546 if not self.selection: 547 return self._processed --> 548 return self._processed.iloc[self.selection] 549 550 ~\.conda\envs\panel-0.12.4\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key) 929 930 maybe_callable = com.apply_if_callable(key, self.obj) --> 931 return self._getitem_axis(maybe_callable, axis=axis) 932 933 def _is_scalar_access(self, key: tuple): ~\.conda\envs\panel-0.12.4\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis) 1555 # a list of integers 1556 elif is_list_like_indexer(key): -> 1557 return self._get_list_axis(key, axis=axis) 1558 1559 # a single integer ~\.conda\envs\panel-0.12.4\lib\site-packages\pandas\core\indexing.py in _get_list_axis(self, key, axis) 1531 except IndexError as err: 1532 # re-raise with different error message -> 1533 raise IndexError("positional indexers are out-of-bounds") from err 1534 1535 def _getitem_axis(self, key, axis: int): IndexError: positional indexers are out-of-bounds ```

Screenshots or screencasts of the bug in action

Succeeds for panel=0.12.1

Copying selected filtered rows works for panel=0.12.1

Fails for panel>=0.12.2

Copying selected filtered rows fails for panel>=0.12.2

grelston commented 2 years ago

Selecting and copying filtered rows succeeds for panel=0.12.1

add_rows-works

grelston commented 2 years ago

Selecting and copying filtered rows fails for panel>=0.12.2

add_rows-fails

grelston commented 2 years ago

Issue #2505 reports the same IndexError but gives no further details.

maximlt commented 2 years ago

Hi @grelston! Thanks for the effort you put in reporting that issue, that's fantastic.

I've been able to reproduce the bug with the last dev release of panel. Basically:

  1. Select some rows
  2. Filter the data so that some selected rows are no longer displayed => At this stage .selection still contains the indices of the selected rows
  3. Call .selected_dataframe which is going to do an .iloc with .selection on ._processed, except that ._processed contains the filter dataframe
grelston commented 2 years ago

Thanks for looking at this @maximlt and @philippjfr.

I just uploaded the Tabulator-regression notebook into JupyterLab on binder with panel master (panel=0.13.0a20+g137c8733-dirty) and I see the same IndexErrors:

[snip]
IndexError: indices are out-of-bounds

The above exception was the direct cause of the following exception:

[snip]
IndexError: positional indexers are out-of-bounds

Do we reopen this issue? Or shall I open a new issue and refer to this one?

maximlt commented 2 years ago

I could reproduce this bug but the important factor apparently is that pagination has to be set to 'remote'. Thanks @grelston, reopening the issue then.

maximlt commented 2 years ago

It's not exactly the same issue, to reproduce:

  1. Filter the data
  2. Select a row
  3. Access the selected_dataframe property.
grelston commented 2 years ago

Thanks for reopening this, @maximlt.

I've just tested the pagination='local' parameter (with panel=0.13.0a31) and I still get the same stack trace. I've pushed the rerun notebook to my Tabulator-regression repo.

maximlt commented 2 years ago

There's some strange behavior going on. When I run all the cells of your notebook (Run all cells command) with either 'local' or 'pagination' for remote I also get the IndexError. However, if I run the cells manually, step by step, I don't.

If, instead of running the whole notebook, I manually try to reproduce the bug (as you did in the videos above), I only get the IndexError with pagination='remote'.

I can't explain the strange behavior I observed in the notebook without any further investigation I can only say that I expect this kind of approach, i.e. setting programmatically filters and selection, to be pretty rare. Also the notebook is using a private method _filter_dataframe().

grelston commented 2 years ago

I've looked at this again in more detail, comparing 2 states of 3 variables (experimental, not code):

variable value 1 value 2
panel= 0.12.1 0.13.0a31
pagination= 'remote' 'local'
UI driver Cell > Run All GUI / manual / mouse

There are in fact 2 bugs / behaviour changes that are showing up under different circumstances:

  1. GUI interaction problem
  2. Code to replicate the GUI interaction problem

I used the private _filter_dataframe() method because there was no current_view property in panel=0.12.1 and I wanted to achieve in code the same "select-all-visible" behaviour shown when selecting the header checkbox in the GUI.

1 GUI interaction problem

First, the GUI interaction problem as reported and illustrated when I first raised this issue.

1.1 panel=0.12.1

With panel=0.12.1, all 4 combinations of pagination and "UI driver" generate the desired outcome (5 rows in tbl2, with code=['00', '01', '03', '07', '08']). However, there are differences in the visible selection state of the first table in the GUI (tbl).

With pagination='remote', the GUI shows that all 3 filtered rows ('03', '07', '08') are selected; checking tbl.selection also shows [3, 7, 8] (see updated notebook and this screenshot in the Tabulator-regression repo). This is true for both methods of driving the UI ("notebook > Cell > Run All" and GUI/manual/mouse).

With pagination='local', the GUI shows that none of the 3 filtered rows are selected after driving the UI with the "notebook > Cell > Run All" (see this screenshot). Checking tbl.selection after GUI/manual/mouse filter-select shows [0, 1, 2] (see this screenshot).

1.2 panel=0.13.0a31

I'm going to have to come back to this later, sorry.

maximlt commented 2 years ago

@grelston I'm working on a fix to this issue. There are a few inconsistencies I'd like to fix:

When this is fixed I think we can try to run your notebook again and see if there are remaining issues.

grelston commented 2 years ago

Thanks again for working on this @maximlt.

Yes, I have noticed inconsistencies in the selection indices and was thinking of raising it as an issue. Always returning the (appropriately sliced and sorted) original DataFrame.index sounds like the right thing to do.

I'll update the notebook to use current_view instead of _filter_dataframe().

grelston commented 2 years ago

For panel=0.13.0a31 with pagination='remote', after manually filtering for descr=='cut' and selecting all, I see this in the notebook:

tbl.selection
[3, 7, 8]

tbl.current_view
  | code | descr -- | -- | -- 3 | 03 | Cut To The Chase 7 | 07 | A Cut Above 8 | 08 | Cut The Mustard
tbl.selected_dataframe
IndexError: indices are out-of-bounds

Pdb (via %debug) shows the problem to be at line 656 of panel/widgets/tables.py in selected_dataframe(). At this point :

ipdb> self._processed
  code             descr
3   03  Cut To The Chase
7   07       A Cut Above
8   08   Cut The Mustard
ipdb> self.selection
[3, 7, 8]
ipdb> self._processed.iloc[self.selection]
*** IndexError: positional indexers are out-of-bounds

Changing the iloc (integer-based) indexer to the loc (label-based) indexer here gives the expected result:

ipdb> self._processed.loc[self.selection]
  code             descr
3   03  Cut To The Chase
7   07       A Cut Above
8   08   Cut The Mustard
grelston commented 2 years ago

For panel=0.13.0a31 with pagination='local', after manually filtering for descr=='cut' and selecting all, I see this in the notebook instead:

tbl.selection
[0, 1, 2]

Which explains why the iloc integer-based indexer gives the expected result (incorrectly, I think) in this case.

philippjfr commented 1 year ago

@maximlt Can you check on the status of this?

orangeSi commented 1 year ago

when I am using panel v1.1.0 + pagination=remove , still get the error, but change to local fix this~

contang0 commented 5 months ago

Is a fix for this on the horizon? I'm using tabulator with very large tables and pagination='local' will not work for me.

philippjfr commented 3 weeks ago

I believe this has now been (largely) fixed. Sorry for the long, long delay here. Part of the confusion is that we need to make it very clear that all indexes are meant to be integer indexes into the original dataframe. This is not yet the case in this instance as the selected values are indexes into the filtered dataframe.