machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.14k stars 48 forks source link

feat(sql): add LazyTbl.last_select to simplify queries #449

Closed machow closed 1 year ago

machow commented 1 year ago

Addresses https://github.com/machow/siuba/issues/346 for simplifying sql queries. The main issue was that siuba's LazyTbl was starting each query out using a full select statement, rather than just the sqlalchemy table object.

The original reason for always using select statements is that in sqlalchemy select statements can do more (e.g. have a with_only_columns method) than table objects (which mostly go in from clauses, are used to start select statements, have columns).

This PR adds a new property last_select to complement last_op. If last_op would return a table, then last_select will make it a select statement. This allows verbs that e.g. use with_only_columns to ensure they get a select statement.

from siuba.data import cars_sql
from siuba import _, inner_join, show_query

inner_join(cars_sql, cars_sql, "cyl") >> show_query()
SELECT cars_1.cyl, cars_1.mpg AS mpg_x, cars_1.hp AS hp_x, cars_2.mpg AS mpg_y, cars_2.hp AS hp_y
FROM cars AS cars_1 JOIN cars AS cars_2 ON cars_1.cyl = cars_2.cyl