ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.13k stars 590 forks source link

`table.mutate(c = _.a.desc())` differs across implementations #8080

Closed MarcoGorelli closed 8 months ago

MarcoGorelli commented 8 months ago

What happened?

In [2]: import ibis
   ...: from ibis import _
   ...:
   ...: con = ibis.pandas.connect()
   ...: con.create_table("table", ibis.memtable({"a": [1, 2, 3], "b": [4, 5, 6]}))
   ...: table = con.table("table")
   ...:
   ...: table.mutate(c = _.a.desc()).execute()
Out[2]:
   a  b  c
0  1  4  1
1  2  5  2
2  3  6  3

In [3]: import ibis
   ...: from ibis import _
   ...:
   ...: con = ibis.polars.connect()
   ...: con.create_table("table", ibis.memtable({"a": [1, 2, 3], "b": [4, 5, 6]}))
   ...: table = con.table("table")
   ...:
   ...: table.mutate(c = _.a.desc()).execute()
Out[3]:
   a  b  c
0  1  4  3
1  2  5  2
2  3  6  1

Looks like the results differ? (if I could hazard a guess - is pandas' index autoaligning something under the hood?)

What version of ibis are you using?

7.2.0

What backend(s) are you using, if any?

pandas / polars

Relevant log output

No response

Code of Conduct

cpcloud commented 8 months ago

Thanks for the issue, definitely looks like a bug!

cpcloud commented 8 months ago

Looking at this some more, we generally don't support the use of _.col.desc()/.asc() in the mutate or select APIs.

It "works" with the Polars backend, but I think that's accidental.

Those expressions are only allowed in an order_by, so the bug here is that we don't enforce that constraint.

jcrist commented 8 months ago

Fixed in the-epic-split by #8100. Closing.

lostmygithubaccount commented 2 months ago

@MarcoGorelli I believe the way to solve this in Ibis (or SQL/relational model in general) is with a join -- perhaps there is a more succinct way

[ins] In [1]: import ibis

[ins] In [2]: import ibis.selectors as s

[nav] In [3]: ibis.options.interactive = True

[ins] In [4]: t = ibis.examples.penguins.fetch()

[ins] In [5]: t = t.mutate(row_number=ibis.row_number()).relocate("row_number").order_by("row_number")

[ins] In [6]: t
Out[6]:
┏━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ row_number ┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64      │ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├────────────┼─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│          0 │ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│          1 │ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│          2 │ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
│          3 │ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│          4 │ Adelie  │ Torgersen │           36.7 │          19.3 │               193 │        3450 │ female │  2007 │
│          5 │ Adelie  │ Torgersen │           39.3 │          20.6 │               190 │        3650 │ male   │  2007 │
│          6 │ Adelie  │ Torgersen │           38.9 │          17.8 │               181 │        3625 │ female │  2007 │
│          7 │ Adelie  │ Torgersen │           39.2 │          19.6 │               195 │        4675 │ male   │  2007 │
│          8 │ Adelie  │ Torgersen │           34.1 │          18.1 │               193 │        3475 │ NULL   │  2007 │
│          9 │ Adelie  │ Torgersen │           42.0 │          20.2 │               190 │        4250 │ NULL   │  2007 │
│          … │ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└────────────┴─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

[ins] In [7]: a = t.select(~s.c("species"))

[ins] In [8]: b = t.select("row_number", "species")

[ins] In [9]: c = a.join(b.order_by(ibis.desc("species")).mutate(row_number=ibis.row_number()), "row_number").order_by("row_number").relocate("row_number", "species")

[ins] In [10]: c
Out[10]:
┏━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ row_number ┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64      │ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├────────────┼─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│          0 │ Gentoo  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│          1 │ Gentoo  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│          2 │ Gentoo  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
│          3 │ Gentoo  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│          4 │ Gentoo  │ Torgersen │           36.7 │          19.3 │               193 │        3450 │ female │  2007 │
│          5 │ Gentoo  │ Torgersen │           39.3 │          20.6 │               190 │        3650 │ male   │  2007 │
│          6 │ Gentoo  │ Torgersen │           38.9 │          17.8 │               181 │        3625 │ female │  2007 │
│          7 │ Gentoo  │ Torgersen │           39.2 │          19.6 │               195 │        4675 │ male   │  2007 │
│          8 │ Gentoo  │ Torgersen │           34.1 │          18.1 │               193 │        3475 │ NULL   │  2007 │
│          9 │ Gentoo  │ Torgersen │           42.0 │          20.2 │               190 │        4250 │ NULL   │  2007 │
│          … │ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└────────────┴─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘