pola-rs / nodejs-polars

nodejs front-end of polars
https://pola-rs.github.io/nodejs-polars/
MIT License
387 stars 40 forks source link

better html repr for `DataFrame` #250

Open oscar6echo opened 1 month ago

oscar6echo commented 1 month ago

This is less a feature request than a question:

image

image

But for nodejs-polars only the 50 first rows are shown without indication of df shape.
Is it on purpose or a shortcut ?

Suggestion: If would help users if the py/js displays both in print/console.log and jupyter would match.

universalmind303 commented 1 month ago

It does look like _reprhtml on the python side has quite a bit more logic than the JS side.

It should be pretty easy to copy over the python html logic.

Bidek56 commented 1 month ago

I cannot reproduce the dataframe display shown in README (and below) - which is the same as in python-polars. Is it still possible ? else why not ?

It works fine for me using nodejs-polars v. 0.14.0, can please show your code and the output?

deno-kernel jupyter notebook shows the first 50 rows in order to not crash the browser with large output, but it's configurable using: process.env.POLARS_FMT_MAX_ROWS, this was discussed during the PR review.

oscar6echo commented 1 month ago

I cannot reproduce the dataframe display shown in README (and below) - which is the same as in python-polars. Is it still possible ? else why not ?

It works fine for me using nodejs-polars v. 0.14.0, can please show your code and the output?

Here is the output from notebook, jupyter console and terminal.

1/ notebook
image

2/ jupyter console
image

3/ terminal
image

In neither of these cases can I reproduce the nice display shown in the README, which happens to be similar to that of the python version print(df).
What should I do to have it ?

Bidek56 commented 1 month ago

I am not using Deno but using bun command line it works fine.

oscar6echo commented 1 month ago

Ok, here is what i get with bun repl:

image

So the output contains what is shown in the README nice display but it is not quite the same.
I find it a bit disconcerting that such basic use is not reproducible on either deno or bun repl.

oscar6echo commented 1 month ago

deno-kernel jupyter notebook shows the first 50 rows in order to not crash the browser with large output, but it's configurable using: process.env.POLARS_FMT_MAX_ROWS, this was discussed during the PR review.

It does look like repr_html on the python side has quite a bit more logic than the JS side.

Indeed please compare the python (arguably reference and certainly more informative) version:

You get the shape and the first/last rows/cols shown (controlled by POLARS_FMT_MAX_(ROW|COL)S).
image

While with nodejs-polars you get the first columns (controlled by POLARS_FMT_MAX_ROWS) without indication of shape.
image

It should be pretty easy to copy over the python html logic.

Perhaps to somebody who knows the inner workings of (1) polars-py (2) polars-nodejs (3) the various specifics of the target runtimes, nodej, deno, bun as the comment above shows they add their own layer before display.

For example I could not find where the selection of rows and cols (first and last selected based on env variables) is performed below polars/polars/dataframe/frame.py | __repr_html__

Bidek56 commented 1 month ago

Can you please use: console.log(df); in bun repl? It works fine for me. I do re-call being an issue with bun implementation for [Symbol.for("nodejs.util.inspect.custom")]().

> console.log(df);
shape: (5, 4)
┌─────┬────────┬─────┬────────┐
│ A   ┆ fruits ┆ B   ┆ cars   │
│ --- ┆ ---    ┆ --- ┆ ---    │
│ f64 ┆ str    ┆ f64 ┆ str    │
╞═════╪════════╪═════╪════════╡
│ 1.0 ┆ banana ┆ 5.0 ┆ beetle │
│ 2.0 ┆ banana ┆ 4.0 ┆ audi   │
│ 3.0 ┆ apple  ┆ 3.0 ┆ beetle │
│ 4.0 ┆ apple  ┆ 2.0 ┆ beetle │
│ 5.0 ┆ banana ┆ 1.0 ┆ beetle │
└─────┴────────┴─────┴────────┘
oscar6echo commented 1 month ago

Can you please use: console.log(df); in bun repl? It works fine for me. I do re-call being an issue with bun implementation for [Symbol.for("nodejs.util.inspect.custom")]().

I get the same output!

It would be good that nodejs:polars be explicit about what runtimes should implement to output the proper display (as in README). Maybe this is already the case ? If so where ?

universalmind303 commented 1 month ago

Can you please use: console.log(df); in bun repl? It works fine for me. I do re-call being an issue with bun implementation for Symbol.for("nodejs.util.inspect.custom").

I get the same output!

It would be good that nodejs:polars be explicit about what runtimes should implement to output the proper display (as in README). Maybe this is already the case ? If so where ?

@oscar6echo The formatting discrepancy is because unlike python and rust, there is no native way to overload methods, so we need to use a Proxy object to support some syntaxes such as bracket notation: df['column']

console.log should always print the correct output as most runtimes have standardized on using Symbol.for("nodejs.util.inspect.custom"), but unfortunately, there is no way to forward the inspect symbol to the dataframe class when wrapping it in a proxy. So it's either drop support for the functionality that the proxy provides, or use console.log.

Edit:

df.toString() should also work the same as console.log(df)

oscar6echo commented 1 month ago

@universalmind303 thx for the insight.

So the working syntax with deno is console.log(df.toString()).
Maybe a bit verbose but output identical to python version. This is useful info !

Examples:

1/ small df

image

2/ larger df

image


so we need to use a Proxy object to support some syntaxes such as bracket notation: df['column']

Ok this is your decision - who am I to debate it - but the .select() syntax achieves the same, is central is polars-py, and more IDE friendly with completion etc. The df['mycol'] syntax seems mostly a contrived way to mimick pandas legacy API - I was a heavy pandas user and now an intensive polars-py one. One may argue this legacy API may not be worth keeping, in particular if it hinders basic user experience. :thinking:

But this is only a side remark.
The main point is: Congrats and thank you for putting together and maintaining nodejs-polars :+1:

universalmind303 commented 1 month ago

Ok this is your decision - who am I to debate it - but the .select() syntax achieves the same, is central is polars-py, and more IDE friendly with completion etc. The df['mycol'] syntax seems mostly a contrived way to mimick pandas legacy API - I was a heavy pandas user and now an intensive polars-py one. One may argue this legacy API may not be worth keeping, in particular if it hinders basic user experience. 🤔

I have thought about deprecating the syntax as I too find the Proxy stuff a bit annoying. I know py-polars discourages the usage of it anyways.