machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.16k stars 49 forks source link

Support glimpse function #409

Open chriscardillo opened 2 years ago

chriscardillo commented 2 years ago

Would be great to have dplyr's glimpse function implemented in siuba and loaded in with from siuba import *.

glimpse is really useful because pandas dataframes don't always print nicely.

machow commented 2 years ago

Agreed it would be nice! As I understand, glimpse (which is implemented in the pillar package) basically does the following:

Examples

Choosing n elements to show per column

glimpse(
  data.frame(
    x = c(paste0(rep("abc", 10), collapse = ""), rep("zzz", 9)),
    y = 1:10
  )
)
image

Different from straight transposing

Transposing a dataframe doesn't dynamically select number of columns (and jacks up the object representations; e.g. ints go to floats, etc..):

from siuba.data import mtcars
mtcars.head().T
image

(Note how this also jacks up representations)

Handles nested representations via a summary

image

To see how it handles each column:

nested = mtcars %>% nest(data = c(-cyl))
pillar:::format_glimpse_1(nested$data)
# [1] "[<tbl_df[7 x 10]>], [<tbl_df[11 x 10]>], [<tbl_df[14 x 10]>]"
pwwang commented 2 years ago

This is cool. Also want to implement it with datar.

Holer90 commented 1 year ago

I have implemented the glimpse function in pandas, opened a feature request and is in the process of submitting a pull request. Now we just need to hope someone has the time to review the pull request!