Closed vpipkt closed 4 years ago
See this for a couple alternate approaches that don't depend DataFrame extension methods.
@metasim tests failing on the SQL expression. Not sure how to work in all the different parameters. We could easily register some functions to compute default percentiles like median, quartiles, quantiles, deciles, etc at default relative error.
@vpipkt I think enabling SQL support is going to take some more work, given the SQL function parameter requirements.... perhaps we need to look for something in the official API that uses non-columnar parameters to a function (does it exist)? For some reason I'm loath to make all of those parameters columnar, but perhaps that's the proper way to do it
I think the current question is do we merge this without support for SQL, or do we work this out first?
I am fine with a little divergence in the API between SQL and python/scala over this.
Possible paths:
rf_agg_approx_median
rf_agg_approx_quartiles
rf_agg_approx_quantiles
rf_agg_approx_deciles
Initial implementation with quantiles
To Do