Open Chuck321123 opened 4 weeks ago
hey, thanks for the request
this was discussed previously and rejected, could you search the issue tracker please?
@MarcoGorelli Hmm I see. Although I am open for getting a warning message, or if we explicitly have to pass a keyword argument to make it work
The extreme asymmetry in the pros and cons is what makes this undesirable (imho):
Pro
Slightly more convenient in a few cases when experimenting between eager/lazy.
Con
You thought you were operating in lazy mode and taking advantage of full query plan optimisation because you can see the final collect()
, but actually you forgot to switch read_parquet
back to scan_parquet
and now your production pipeline is operating in eager mode, is an order of magnitude slower, and your AWS bill just went up 10x for the week until the cause of the slowdown was root-caused ;)
Description
So collecting a df is used for lazyframes. However, I sometimes run my code in eager mode, and sometimes in lazymode. However, the amount of if-else and try-except functions i have to make in my code makes it exhausting to switch between eager and lazy mode. I would have prefered not to get an
AttributeError
when runningcollect()
on a eager frame. I can't be the only one wanting this function I believe, or have I missed something?Example: