pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.35k stars 1.86k forks source link

Constrain the length of string columns printed in jupyter #3543

Closed braaannigan closed 2 years ago

braaannigan commented 2 years ago

What language are you using?

Python

Have you tried latest version of polars?

yes

What version of polars are you using?

0.13.40

What operating system are you using polars on?

MacOS

What language version are you using

python 3.10

Describe your bug.

When printing datafames in jupyter the full string values are printed. With long strings this makes each row very long and hard to read. In ipython and the base python terminal the behaviour is better - in these terminal only a certain number of characters are printed and the number can be reduced further (though not increased) with pl.Config.set_tbl_width_chars.

What are the steps to reproduce the behavior?

Run the following in a notebook

import polars as pl
df = pl.DataFrame({'cats':["""Miranda Viramontes twirled a complete-game shutout in the opener and the Utes got some timely hitting against the Volunteers for a 3-0 victory. While we have this break between innings - here is @mirandavee20 shutting the door against Tennessee! #GoUtes #UteFamily pic.twitter.com/7Ck1nOhmbn — Utah Softball (@Utah_Softball) February 25, 2017 
The Utes attacked from the onset as Hannah Flippen doubled to start the game. After a pair of hard-hit balls to the outfield, she took third on a wild pitch. Heather Bowen came through with a single to left for a 1-0 lead. 
Delilah Pacheco started the third with a single, moved up on a sacrifice bunt and then over to third on a grounder back to the circle. Once more, the Utes delivered with two outs as Anissa Urtez smoked a single to center. 
Bridget Castro started the fifth with a walk and was lifted for pinch runner Ryley Ball. She moved up on a groundout and Barrera singled with two outs for a 3-0 edge. 
Meanwhile."""]})
df

What is the actual behavior?

Full string is printed in jupyter

What is the expected behavior?

What do you think polars should have done? Allowed us to limit the width of columns

alexander-beedie commented 2 years ago

This would seem to be a feature request, rather than a bug?

thobai commented 2 years ago

While I really do like the truncation for long String / Categorical values, there's many use cases where I would like to actually see values longer than 15 characters. Is there any option to disable the truncation?

kthwaite commented 2 years ago

While I really do like the truncation for long String / Categorical values, there's many use cases where I would like to actually see values longer than 15 characters. Is there any option to disable the truncation?

I had the same issue; posting the answer here in case it's not immediately obvious. It looks like the environment variable POLARS_FMT_STR_LEN is used to control the length of printed strings: https://github.com/pola-rs/polars/blob/master/py-polars/polars/_html.py#L96

Though you can obviously set this environment variable directly, it's probably a better idea to use the provided pl.Config. set_fmt_str_lengths classmethod; so, to 'disable' truncation with respect to the maximum string length in some column:

max_len = df['my_string_col'].str.lengths().max()
pl.Config. set_fmt_str_lengths(max_len)
# equivalent to
os.environ['POLARS_FMT_STR_LEN'] = str(max_len)