rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.43k stars 903 forks source link

[FEA] dataframe.corr() missing "kendall" method #11924

Open dawilliams-nvidia opened 2 years ago

dawilliams-nvidia commented 2 years ago

Is your feature request related to a problem? Please describe. Pandas has 4 options for the methods parameter in the corr() function: "pearson", "spearman", "kendall", and "callable" which accepts a callable object instead of a predetermined algorithm: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html

CuDF currently only supports "pearson" and "spearman": https://docs.rapids.ai/api/cudf/stable/api_docs/api/cudf.DataFrame.corr.html

Describe the solution you'd like Can we evaluate cudf and dask+cudf to implement the "kendall" correlation method?

Context NVIDIA Solutions Architect, filing on behalf of customer

Related request for "callable" method: https://github.com/rapidsai/cudf/issues/11926

beckernick commented 2 years ago

@dawilliams-nvidia , can we create a separate feature request for kendall and callable? Supporting kendall rank correlation is conceptually separate from supporting arbitrary (or even constrained) callables as correlation functions.

dawilliams-nvidia commented 2 years ago

Ah, will do! I have edited the request to exclusively discuss the "kendall" method. "callable" can be found here: https://github.com/rapidsai/cudf/issues/11926

dawilliams-nvidia commented 2 years ago

Also adding a comment to clarify that ideally this functionality will also extend to dask+cudf as well! Happy to open a separate issue for dask discussion if needed