ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.08k stars 586 forks source link

ux: change non-interactive repr to look more like interactive repr #10095

Open jcrist opened 1 week ago

jcrist commented 1 week ago

Currently when constructing ibis expressions in non-interactive mode (the default), expressions repr as a description of the operations they're composed of:

In [1]: import ibis

In [2]: t = ibis.examples.diamonds.fetch()

In [3]: t.mutate(volume=t.x * t.y * t.z)
Out[3]: 
r0 := DatabaseTable: diamonds
  carat   float64
  cut     string
  color   string
  clarity string
  depth   float64
  table   float64
  price   int64
  x       float64
  y       float64
  z       float64

Project[r0]
  carat:   r0.carat
  cut:     r0.cut
  color:   r0.color
  clarity: r0.clarity
  depth:   r0.depth
  table:   r0.table
  price:   r0.price
  x:       r0.x
  y:       r0.y
  z:       r0.z
  volume:  r0.x * r0.y * r0.z

While this expr repr can be nice for inspection, it's rarely what I want when building up expressions lazily. Since ibis expressions are very composable, rarely do I need to know the steps used to get to a certain expression (e.g. I don't care that a group_by or filter was called earlier). Really all I care about is the schema/type of the object.

I propose we:

A quick mockup:

In [1]: import ibis

In [2]: t = ibis.examples.diamonds.fetch()

In [3]: t.mutate(volume=t.x * t.y * t.z)
Out[3]: 
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┓
┃ carat   ┃ cut       ┃ color  ┃ clarity ┃ depth   ┃ table   ┃ price ┃ x       ┃ y       ┃ z       ┃ volume    ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━┩
│ float64 │ string    │ string │ string  │ float64 │ float64 │ int64 │ float64 │ float64 │ float64 │ float64   │
├─────────┼───────────┼────────┼─────────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼───────────┤
│       … │ …         │ …      │ …       │       … │       … │     … │       … │       … │       … │         … │
└─────────┴───────────┴────────┴─────────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┴───────────┘

In [5]: t.mutate(volume=t.x * t.y * t.z).select("carat", "volume")
Out[5]: 
┏━━━━━━━━━┳━━━━━━━━━━━┓
┃ carat   ┃ volume    ┃
┡━━━━━━━━━╇━━━━━━━━━━━┩
│ float64 │ float64   │
├─────────┼───────────┤
│       … │         … │
└─────────┴───────────┘
cpcloud commented 1 week ago

image

cpcloud commented 1 week ago

In all seriousness, I really like this idea!

jcrist commented 1 week ago

Sounds good! I think we should aim to get this in for 10.0 then.

One open question is what to do with scalars (since in interactive mode they only show the value, not the type).

A few options:

# Interactive
┌────────────┐
│ float64    │
├────────────┤
│   43040.87 │
└────────────┘ 

# Non-interactive (could also only add the type to the non-interactive version?)
┌─────────┐
│ float64 │
├─────────┤
│       … │
└─────────┘ 
# Interactive
┌──────────┐
│ 43040.87 │
└──────────┘

# Non-interactive
Scalar<float64>
# Interactive
┌──────────┐
│ 43040.87 │
└──────────┘

# Non-interactive (this might be easy to mistake for an interactive string scalar with value `"Scalar<float64>"`)
┌─────────────────┐
│ Scalar<float64> │
└─────────────────┘

I have a slight preference for the first option, but :shrug:.

gforsyth commented 1 week ago

I definitely like the look of this -- it might be nice to keep the old repr around for OUR inspection, but make it private.

I like option 1 above, but I can get on board with any of them.

drin commented 1 week ago

I randomly found this and just wanted to chime in: I think this sounds like a great idea and moving the old repr to an explain function or something similar makes a lot of sense.

it might be nice to keep the old repr around for OUR inspection, but make it private.

not sure what visibility you mean by private (maybe just surrounded wth __?) but it'd be nice for it to be easily accessible for substrait users. I could also imagine wanting to extend it with various verbosity flags (ops only, ops + predicates, etc.) to make validation or general observability easier.

gforsyth commented 1 week ago

not sure what visibility you mean by private

yeah, just with a leading _ so it doesn't show up in tab-completion, but I'm also not opposed to leaving it more readily available if there's desire for that.