mwouts / itables

Pandas DataFrames as Interactive DataTables
https://mwouts.github.io/itables/
MIT License
803 stars 58 forks source link

Implement a Jupyter Widget for ITables #319

Closed mwouts closed 1 month ago

mwouts commented 2 months ago

I have used AnyWidget to provide the widget, as suggested at https://github.com/mwouts/itables/issues/267#issuecomment-2343712189

Closes #267 Closes #250

TODO

github-actions[bot] commented 2 months ago

Thank you for making this pull request.

Did you know? You can try it on Binder: Binder:lab.

Also, the version of ITables developed in this PR can be installed with pip:

pip install git+https://github.com/mwouts/itables.git@try_anywidget

(this requires nodejs, see more at Developing ITables)

codecov-commenter commented 2 months ago

Codecov Report

Attention: Patch coverage is 80.43478% with 36 lines in your changes missing coverage. Please review.

Project coverage is 93.65%. Comparing base (20546b2) to head (a177e4a). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/itables/javascript.py 67.39% 15 Missing :warning:
src/itables/widget/__init__.py 90.47% 8 Missing :warning:
tests/sample_python_apps/itables_in_a_shiny_app.py 41.66% 7 Missing :warning:
src/itables/shiny.py 73.91% 6 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #319 +/- ## ========================================== - Coverage 95.88% 93.65% -2.24% ========================================== Files 27 28 +1 Lines 1191 1339 +148 ========================================== + Hits 1142 1254 +112 - Misses 49 85 +36 ``` | [Flag](https://app.codecov.io/gh/mwouts/itables/pull/319/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Marc+Wouts) | Coverage Δ | | |---|---|---| | [](https://app.codecov.io/gh/mwouts/itables/pull/319/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Marc+Wouts) | `93.65% <80.43%> (-2.24%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Marc+Wouts#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jgunstone commented 2 months ago

just tried to run through the docs (on binder and locally) and got this error:

import ipywidgets as widgets

from itables import show
from itables.sample_dfs import get_dict_of_test_dfs

sample_dfs = get_dict_of_test_dfs()

def use_show_in_interactive_output(table_name: str):
    show(
        sample_dfs[table_name],
        caption=table_name,
        style="table-layout:auto;width:auto;float:left;caption-side:bottom",
    )

table_selector = widgets.Dropdown(options=sample_dfs.keys(), value="int_float_str")
out = widgets.interactive_output(
    use_show_in_interactive_output, {"table_name": table_selector}
)

widgets.VBox([table_selector, out])
full stack trace ``` --------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[1], line 6 3 from itables import show 4 from itables.sample_dfs import get_dict_of_test_dfs ----> 6 sample_dfs = get_dict_of_test_dfs() 9 def use_show_in_interactive_output(table_name: str): 10 show( 11 sample_dfs[table_name], 12 caption=table_name, 13 style="table-layout:auto;width:auto;float:left;caption-side:bottom", 14 ) File [~/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/itables/sample_dfs.py:202](http://127.0.0.1:8888/home/jovyan/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/itables/sample_dfs.py#line=201), in get_dict_of_test_dfs(N, M, polars) 103 def get_dict_of_test_dfs(N=100, M=100, polars=False): 104 NM_values = np.reshape(np.linspace(start=0.0, stop=1.0, num=N * M), (N, M)) 106 test_dfs = { 107 "empty": pd.DataFrame(dtype=float), 108 "no_rows": pd.DataFrame(dtype=float, columns=["a"]), 109 "no_columns": pd.DataFrame(dtype=float, index=["a"]), 110 "no_rows_one_column": pd.DataFrame([1.0], index=["a"], columns=["a"]).iloc[:0], 111 "no_columns_one_row": pd.DataFrame([1.0], index=["a"], columns=["a"]).iloc[ 112 :, :0 113 ], 114 "bool": pd.DataFrame( 115 [[True, True, False, False], [True, False, True, False]], 116 columns=list("abcd"), 117 ), 118 "nullable_boolean": pd.DataFrame( 119 [ 120 [True, True, False, None], 121 [True, False, None, False], 122 [None, False, True, False], 123 ], 124 columns=list("abcd"), 125 dtype="bool" if PANDAS_VERSION_MAJOR == 0 else "boolean", 126 ), 127 "int": pd.DataFrame( 128 [[-1, 2, -3, 4, -5], [6, -7, 8, -9, 10]], columns=list("abcde") 129 ), 130 "nullable_int": pd.DataFrame( 131 [[-1, 2, -3], [4, -5, 6], [None, 7, None]], 132 columns=list("abc"), 133 dtype="Int64", 134 ), 135 "float": pd.DataFrame( 136 { 137 "int": [0.0, 1], 138 "inf": [np.inf, -np.inf], 139 "nan": [np.nan, -np.nan], 140 "math": [math.pi, math.e], 141 } 142 ), 143 "str": pd.DataFrame( 144 { 145 "text_column": ["some", "text"], 146 "very_long_text_column": ["a " + "very " * 12 + "long text"] * 2, 147 } 148 ), 149 "time": pd.DataFrame( 150 { 151 "datetime": [datetime(2000, 1, 1), datetime(2001, 1, 1), pd.NaT], 152 "timestamp": [ 153 pd.NaT, 154 datetime(2000, 1, 1, 18, 55, 33), 155 datetime( 156 2001, 157 1, 158 1, 159 18, 160 55, 161 55, 162 456654, 163 tzinfo=None if pytz is None else pytz.timezone("US/Eastern"), 164 ), 165 ], 166 "timedelta": [ 167 timedelta(days=2), 168 timedelta(seconds=50), 169 pd.NaT - datetime(2000, 1, 1), 170 ], 171 } 172 ), 173 "date_range": pd.DataFrame( 174 {"timestamps": pd.date_range("now", periods=5, freq="s")} 175 ), 176 "ordered_categories": pd.DataFrame( 177 {"int": np.arange(4)}, 178 index=pd.CategoricalIndex( 179 ["first", "second", "third", "fourth"], 180 categories=["first", "second", "third", "fourth"], 181 ordered=True, 182 name="categorical_index", 183 ), 184 ), 185 "ordered_categories_in_multiindex": pd.DataFrame( 186 {"int": np.arange(4), "integer_index": np.arange(4)}, 187 index=pd.CategoricalIndex( 188 ["first", "second", "third", "fourth"], 189 categories=["first", "second", "third", "fourth"], 190 ordered=True, 191 name="categorical_index", 192 ), 193 ).set_index("integer_index", append=True), 194 "object": pd.DataFrame( 195 {"dict": [{"a": 1}, {"b": 2, "c": 3}], "list": [["a"], [1, 2]]} 196 ), 197 "multiindex": pd.DataFrame( 198 np.arange(16).reshape((4, 4)), 199 columns=pd.MultiIndex.from_product((["A", "B"], [1, 2])), 200 index=pd.MultiIndex.from_product((["C", "D"], [3, 4])), 201 ), --> 202 "countries": get_countries(), 203 "capital": get_countries().set_index(["region", "country"])[["capital"]], 204 "complex_index": get_df_complex_index(), 205 "int_float_str": pd.DataFrame( 206 { 207 "int": range(N), 208 "float": np.linspace(5.0, 0.0, N), 209 "str": [ 210 letter for letter, _ in zip(cycle(string.ascii_lowercase), range(N)) 211 ], 212 } 213 ), 214 "wide": pd.DataFrame( 215 NM_values, 216 index=["row_{}".format(i) for i in range(N)], 217 columns=["column_{}".format(j) for j in range(M)], 218 ), 219 "long_column_names": pd.DataFrame( 220 { 221 "short name": [0] * 5, 222 "very " * 5 + "long name": [0] * 5, 223 "very " * 10 + "long name": [1] * 5, 224 "very " * 20 + "long name": [2] * 5, 225 "nospacein" + "very" * 50 + "longname": [3] * 5, 226 "nospacein" + "very" * 100 + "longname": [3] * 5, 227 } 228 ), 229 "sorted_index": pd.DataFrame( 230 {"i": [0, 1, 2], "x": [0.0, 1.0, 2.0], "y": [0.0, 0.1, 0.2]} 231 ).set_index(["i"]), 232 "reverse_sorted_index": pd.DataFrame( 233 {"i": [2, 1, 0], "x": [0.0, 1.0, 2.0], "y": [0.0, 0.1, 0.2]} 234 ).set_index(["i"]), 235 "sorted_multiindex": pd.DataFrame( 236 {"i": [0, 1, 2], "j": [3, 4, 5], "x": [0.0, 1.0, 2.0], "y": [0.0, 0.1, 0.2]} 237 ).set_index(["i", "j"]), 238 "unsorted_index": pd.DataFrame( 239 {"i": [0, 2, 1], "x": [0.0, 1.0, 2.0], "y": [0.0, 0.1, 0.2]} 240 ).set_index(["i"]), 241 "duplicated_columns": pd.DataFrame( 242 np.arange(4, 8).reshape((2, 2)), 243 columns=pd.Index(["A", "A"]), 244 index=pd.MultiIndex.from_arrays( 245 np.arange(4).reshape((2, 2)), names=["A", "A"] 246 ), 247 ), 248 "named_column_index": pd.DataFrame({"a": [1]}).rename_axis("columns", axis=1), 249 "big_integers": pd.DataFrame( 250 { 251 "bigint": [ 252 1234567890123456789, 253 2345678901234567890, 254 3456789012345678901, 255 ], 256 "expected": [ 257 "1234567890123456789", 258 "2345678901234567890", 259 "3456789012345678901", 260 ], 261 } 262 ), 263 } 265 if polars: 266 import polars as pl File [~/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/itables/sample_dfs.py:43](http://127.0.0.1:8888/home/jovyan/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/itables/sample_dfs.py#line=42), in get_countries(html) 40 def get_countries(html=True): 41 """A Pandas DataFrame with the world countries (from the world bank data) 42 Flags are loaded from https://flagpedia.net/""" ---> 43 df = pd.read_csv(find_package_file("samples/countries.csv")) 44 df = df.rename(columns={"capitalCity": "capital", "name": "country"}) 45 df["iso2Code"] = df["iso2Code"].fillna("NA") # Namibia File ~/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend) 1013 kwds_defaults = _refine_defaults_read( 1014 dialect, 1015 delimiter, (...) 1022 dtype_backend=dtype_backend, 1023 ) 1024 kwds.update(kwds_defaults) -> 1026 return _read(filepath_or_buffer, kwds) File [~/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/pandas/io/parsers/readers.py:620](http://127.0.0.1:8888/home/jovyan/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/pandas/io/parsers/readers.py#line=619), in _read(filepath_or_buffer, kwds) 617 _validate_names(kwds.get("names", None)) 619 # Create the parser. --> 620 parser = TextFileReader(filepath_or_buffer, **kwds) 622 if chunksize or iterator: 623 return parser File [~/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1620](http://127.0.0.1:8888/home/jovyan/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/pandas/io/parsers/readers.py#line=1619), in TextFileReader.__init__(self, f, engine, **kwds) 1617 self.options["has_index_names"] = kwds["has_index_names"] 1619 self.handles: IOHandles | None = None -> 1620 self._engine = self._make_engine(f, self.engine) File [~/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1880](http://127.0.0.1:8888/home/jovyan/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/pandas/io/parsers/readers.py#line=1879), in TextFileReader._make_engine(self, f, engine) 1878 if "b" not in mode: 1879 mode += "b" -> 1880 self.handles = get_handle( 1881 f, 1882 mode, 1883 encoding=self.options.get("encoding", None), 1884 compression=self.options.get("compression", None), 1885 memory_map=self.options.get("memory_map", False), 1886 is_text=is_text, 1887 errors=self.options.get("encoding_errors", "strict"), 1888 storage_options=self.options.get("storage_options", None), 1889 ) 1890 assert self.handles is not None 1891 f = self.handles.handle File [~/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/pandas/io/common.py:873](http://127.0.0.1:8888/home/jovyan/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/pandas/io/common.py#line=872), in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options) 868 elif isinstance(handle, str): 869 # Check whether the filename is to be opened in binary mode. 870 # Binary mode does not support 'encoding' and 'newline'. 871 if ioargs.encoding and "b" not in ioargs.mode: 872 # Encoding --> 873 handle = open( 874 handle, 875 ioargs.mode, 876 encoding=ioargs.encoding, 877 errors=errors, 878 newline="", 879 ) 880 else: 881 # Binary mode 882 handle = open(handle, ioargs.mode) FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/itables/samples/countries.csv' ```
mwouts commented 2 months ago

just tried to run through the docs (on binder and locally) and got this error:

(...)
FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/miniforge3/envs/complexapps-2024/lib/python3.11/site-packages/itables/samples/countries.csv'

Thanks for giving it a go, and sorry about that - My attempt to simplify the pyproject.toml didn't go as expected...

This should be fixed now, at least I have seen this notebook run on Binder: https://mybinder.org/v2/gh/mwouts/itables/try_anywidget?urlpath=lab/tree/docs/ipywidgets.md. Let me know what you think! Thanks

jgunstone commented 2 months ago

thanks @mwouts for the fix - generally looks and works great, I can see the ability to provide bits of interaction whilst keeping the look and feel of itables could be v useful.

just had a quick play and have a few comments to address as you see fit:

you can copy the markdown below into you ipywidget.md file for more info


## JG Comments

```{code-cell} ipython3

import ipywidgets as widgets
from itables import show
from itables.sample_dfs import get_dict_of_test_dfs
from itables.widget import ITable

sample_dfs = get_dict_of_test_dfs()
name = "ordered_categories"
df = sample_dfs[name]

table = ITable()  

table = ITable(
    df,
    caption=name,
    select=True,
    style="table-layout:auto;width:auto;float:left",
)
table
# view traits

table.traits()
columns = [c["title"] for c in table.dt_args["columns"]]
columns
# `data` is the trait name for the widget value
# as a big ipywidgets user, I'd advocate using `value` instead 
# as that then becomes consistent with all the other ipywidgets
table.data  # would a list of records be possible?
columns = [c["title"] for c in table.dt_args["columns"]]
[dict(zip(d, columns)) for d in table.data]
# not possible set table trait value... which would be nice
table.data  = [['a', 0], ['b', 1], ['c', 2], ['d', 3]]
table = ITable(df)  
table
# used this to check that the `selected_rows` doesn't care about search - it doesn't - which is great 
name1 = "countries"
df1 = sample_dfs[name1]
table1 = ITable(
    df1,
    caption=name1,
    select=True,
    style="table-layout:auto;width:auto;float:left",
)
table1
table1.selected_rows
mwouts commented 2 months ago

Hi @jgunstone , thank you so much for your feedback, that's really helpful!

I personally think that value would be a better trait name that data as it matches the ipywidgets lib. (that said, ipydatagrid uses data for dataframes so there is already a precedent for that)

Well interesting that you mention that! I was seeing data and dt_args as internal traits, an I don't really expect the users to modify them. Instead, I was thinking that you would use the update method to update the data and the dt args by passing directly the dataframe and the usual options (if you don't mind, can you give it a try and let me know what you think?)

Internally the update method transforms df into the appropriate list of rows, defines the columns, and increases destroy_and_recreate to refresh the table (refreshing on data or dt_args separately causes issues when the column definitions don't match the row length).

https://github.com/mwouts/itables/blob/d8c25a0707f036053b2b948947289556f031c2db/packages/itables_anywidget/js/widget.ts#L94-L96

At the very least I should make that more explicit in the documentation. I can also move the traits that I don't think people should use to underscore names as you suggest, that's a good point! Actually, the traits that I would like to expose are the following:

At some point I plan to make the tables editable (https://github.com/mwouts/itables/issues/243, will require a subscription to datatables' editor), but until then I don't want to expose data directly (it's not a one to one conversion of df, etc). I'd be curious to give a look at how you can defined setters and getters - setting df for instance would be ideal and possibly more idiomatic than update.

Also thanks for pointing out at ipydatagrid! At first sight we seem to have the same approach re passing the data as a DataFrame through the first argument of the widget. Do you see people interacting directly with the data attribute maybe?

jgunstone commented 1 month ago

Hi hi - this is how they do the setter / getter in ipydatagrid: https://github.com/jupyter-widgets/ipydatagrid/blob/f7fab2945d89063eaa647fb7e9f94cc1c140d7bb/ipydatagrid/datagrid.py#L465-L492

and then the trait is _data - maybe you could do a similar thing by putting the code in your update method into the setter? I think it would be nice to interact with data in this way.

just playing with dt_args and getting a little confused (though tbh I haven't done loads of customisation stuff with itables so not super familiar generally) -

what vars would typically be passed to dt_args, and how are they distinct from what would just be passed as **kwargs?

import pandas as pd
import itables.options as opt
from itables import init_notebook_mode, show
from itables.sample_dfs import get_countries

df = get_countries(html=False)
init_notebook_mode(all_interactive=True)

show(df, classes="display nowrap compact")
ITable(df, classes="display nowrap compact")
# ^ this works the same as show which is nice from a user perspective. 

ITable(df, dt_args=dict(classes="display nowrap compact"))
# ^ this doesn't do anything.... 
mwouts commented 1 month ago

The Jupyter Widget is now part of ITables v2.2. See https://mwouts.github.io/itables/ipywidgets.html for the documentation.

Thank you @jgunstone for your feedback on the widget, it has been very helpful. Since our last chat I have made sure that only the traits that the user can modify directly are public. I have also added a df property and setter to let the user modify the underlying dataframe more easily - examples are available in the documentation.

Re your last question re dt_args, that's an internal distinction that I make between the arguments that are passed to the JavaScript DataTable constructor, and the other ones (e.g caption, style, classes, selected rows...). As a user you don't need to make that distinction.