Open datapythonista opened 4 months ago
It seems to work. You must ensure to implement the trait and use that trait object as anonymous scan.
#[test]
fn scan_anonymous_fn_with_options() -> PolarsResult<()> {
struct MyScan {}
impl AnonymousScan for MyScan {
fn as_any(&self) -> &dyn Any {
self
}
fn allows_projection_pushdown(&self) -> bool {
true
}
fn scan(&self, scan_opts: AnonymousScanArgs) -> PolarsResult<DataFrame> {
assert_ne!(scan_opts.with_columns, None);
assert_ne!(scan_opts.n_rows, None);
let out = fruits_cars().select(scan_opts.with_columns.unwrap().as_ref())?;
Ok(out.slice(0, scan_opts.n_rows.unwrap()))
}
}
let function = Arc::new( MyScan{});
let args = ScanArgsAnonymous {
schema: Some(Arc::new(fruits_cars().schema())),
..ScanArgsAnonymous::default()
};
let q = LazyFrame::anonymous_scan(function, args)?
.select([col("A"), col("fruits")])
.limit(3);
let df = q.collect()?;
assert_eq!(df.shape(), (3, 2));
Ok(())
}
Note that fetch
doesn't do a slice pushdown. Fetch does not lead to correct queries, but just limits a scan to only produce n
rows. (though anonymous scans) don't have to respect it. Fetch is only intended for debug purposes.
Interesting. Seems like it's a bit more complex than I thought. This seems to be failing only when using with_columns
, and I can reproduce in my project with 0.40, but not in main
. I assumed to quickly projection pushdown was always failing for AnonymousScan, sorry about that. I'll check again in my project when 0.41 is working, as I think even when using with_columns
it's fixed now.
I'll open a PR with your test, I think it should be useful to have it in the test suite. Thanks for the help with this!
0.41.2 is released. Can we close this one?
I had another look, seems like the problem is filtering by a calculated column. The test we have now in the test suite fails with this pipeline:
let q = LazyFrame::anonymous_scan(function, args)?
.with_column((col("A") * lit(2)).alias("A2"))
.filter(col("A2").lt(lit(6))) // <- ADDED THIS LINE
.select([col("A2"), col("fruits")])
.limit(3);
Checks
Reproducible example
Log output
No response
Issue description
Not sure if I'm missing something, but looks like projection pushdown is not working for
AnonymousScan
.Besides the provided failing test in #17130, I tried implementing a struct with the
AnonymousScan
trait, and specifying:But when the
scan
function is called by Polars, I'd expectAnonymousScanArgs.with_columns
to contain the needed columns fromselect
, but I receiveNone
instead.AnonymousScanArgs.n_rows
seems to be correctly receiving the value from.fetch()
, so the problem seems specific towith_columns
.Expected behavior
I'd expect the
AnonymousScanArgs
passed to myscan
function in myAnonymousScan
trait implementation to contain the projection with the required columns only, notNone
.Installed versions