tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
474 stars 173 forks source link

`distinct()` in Databricks/SparkSQL causes "arrange()... __row_num_*" error #1481

Open fabkury opened 6 months ago

fabkury commented 6 months ago

From my perspective, this error started happening at some point in the past few weeks.

Merely calling dplyr::distinct() on a lazy (remote) tibble gives:

Error in arrange(., !!sym(row_num)):
1 In argument:  `__row_num_a46479cf_8586_4003_b032_d43e0bc6c4d1`
Caused by error:
! Object `__row_num_a46479cf_8586_4003_b032_d43e0bc6c4d1` not found.
Error in arrange():

That arrange(., !!sym(row_num)) is not from my script.

I am able to circumvent the problem by doing a trivial group_by() then keeping only the group keys.

Thanks for the awesome software.