Expose the logical plan AST or some method that would allow you to programatically inspect the computation graph

pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

https://docs.pola.rs

Other

29.83k stars 1.92k forks source link

Expose the logical plan AST or some method that would allow you to programatically inspect the computation graph #9771

Open kszlim opened 1 year ago

kszlim commented 1 year ago

Problem description

I wish that polars would let you go:

graph = ldf.get_logical_plan()

I'm not sure what the nodes themselves should look like (but I'd like the ability to trace the lineage of columns back to the base columns of the file that I'm reading from).

stinodego commented 1 year ago

You can use LazyFrame.explain or LazyFrame.show_graph for this.

See docs: https://pola-rs.github.io/polars/py-polars/dev/reference/lazyframe/descriptive.html

If this is not what you're looking for, please elaborate.

kszlim commented 1 year ago

I want something that returns an object AST instead of a string. Something that doesn't have to be parsed. There must be an internal representation that's not exposed right?

cjackal commented 1 year ago

What about LazyFrame.write_json? (up to a caveat of floating point approximation)

ritchie46 commented 1 year ago

There is on the rust side. I am planning to expose a visitor that gives you this access.

abhiaagarwal commented 1 year ago

I believe you can use serialize method on a LazyFrame then convert the JSON into a dict for parsing purposes. It ain't pretty, but it should work!

kszlim commented 1 year ago

Don't think this is available in the python bindings?

douglas-raillard-arm commented 8 months ago

One use case not reasonably serviced by calling serialize() is checking if a LazyFrame is backed by a DataFrame or by a parquet file. Serializing would mean rendering to JSON an entire DataFrame which would be a performance disaster (or even crash) if the data is large enough.