locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
243 stars 46 forks source link

Add rf_agg_approx_quantiles function #429

Closed vpipkt closed 4 years ago

vpipkt commented 4 years ago

Initial implementation with quantiles

To Do

metasim commented 4 years ago

See this for a couple alternate approaches that don't depend DataFrame extension methods.

https://github.com/s22s/rasterframes/pull/57

vpipkt commented 4 years ago

@metasim tests failing on the SQL expression. Not sure how to work in all the different parameters. We could easily register some functions to compute default percentiles like median, quartiles, quantiles, deciles, etc at default relative error.

metasim commented 4 years ago

@vpipkt I think enabling SQL support is going to take some more work, given the SQL function parameter requirements.... perhaps we need to look for something in the official API that uses non-columnar parameters to a function (does it exist)? For some reason I'm loath to make all of those parameters columnar, but perhaps that's the proper way to do it

I think the current question is do we merge this without support for SQL, or do we work this out first?

vpipkt commented 4 years ago

I am fine with a little divergence in the API between SQL and python/scala over this.

Possible paths:

  1. No SQL function. In docs state there is no SQL support.
  2. Expose SQL functions for "canned" quantiles; Can add these to Python and Scala as well
    1. rf_agg_approx_median
    2. rf_agg_approx_quartiles
    3. rf_agg_approx_quantiles
    4. rf_agg_approx_deciles
vpipkt commented 4 years ago

Latest commit should pass tests and is basically the option 1 above

Recommend we add an issue for option 2, lower priority.