Open MrPowers opened 1 year ago
@MrPowers : I would like to contribute on this, could you please assign this issue to me.
@MrPowers @SemyonSinchenko Did we ever brainstorm on this? I have lost count of the number of times I would have loved a functionality like this. Would love to take this up.
@kunaljubce Because of spark-connect we cannot use _jvm
here. So, the only known for me option was to parse the plan. But @MrPowers does not like this idea (see arguments here: https://github.com/MrPowers/quinn/pull/159).
JFYI: This function do exactly this job -- it estimates the size of DF in bytes (megabytes) without computation.
So, in my opinion, there is no way to do it (except collection to driver that is a terrible option). @kunaljubce If you have other vision how it may be implemented or you have new arguments for my discussion with @MrPowers (his arg was that the plan representation is very unstable) we can raise this topic again!
From a Redditor on this thread: