oneapi-src / oneDAL

oneAPI Data Analytics Library (oneDAL)
https://software.intel.com/en-us/oneapi/onedal
Apache License 2.0
608 stars 211 forks source link

Add other stats for low-order moments #2006

Open xwu99 opened 2 years ago

xwu99 commented 2 years ago

We are using oneDAL distr algos to optimize Spark ML. Some metrics are missing and Could you check if you can add the following stats in distributed low-order moments (basic stats) ?

Check for details: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/mllib/stat/MultivariateStatisticalSummary.html

makart19 commented 2 years ago

Clarification details per our discussion with Xiaochang:

Also, need to check how much adding all these metrics will affect performance of default case (when all metrics are calculated).

xwu99 commented 2 years ago

Thanks @makart19. For weight column, Could also consider a general support for weighted points as a general feature for all algorithms, such as weighted points for kmeans etc. Check Spark's Kmeans, there is a optional weightCol to be set. https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.KMeans.html

makart19 commented 2 years ago

Ok, we will consider weights support for other algorithms