mekevans / forestTIME

1 stars 0 forks source link

ORDER BY is ignored in subqueries without LIMIT #40

Closed diazrenata closed 5 months ago

diazrenata commented 5 months ago

This shows up if you have arrange() early in a duckdb pipeline.

The arrange() is NOT being ignored, and you do need this to correctly calculate the sapling transitions.

So in this, I think it's good to keep all the arrange() and order_by() as written.

And encode a check comparing the results of the two window-heavy operations (sapling and annualization) done with dbplyr to ones done in memory with dplyr.

diazrenata commented 5 months ago

Currently duckdb with arrange and order is equal to dplyr EXCEPT for one edge case where missing_data_prop == 1. In that instance, duckdb says there were NA prev live saplings and dplyr says there were 0.