rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.25k stars 534 forks source link

Optimize MG variance calculation for dataset standardization for logistic regression #6138

Open lijinf2 opened 2 days ago

lijinf2 commented 2 days ago

MG variance calculation currently involks raft SG vars API. However, the abs() step of raft SG vars API introduces errors in skewed data distribution (e.g., one GPU gets small values 1 and 2, and the other GPU gets large values 98 and 99).

The PR avoids the effect of abs() when involking SG vars for calculating MG vars. The key idea is to pass a vector of zeroes when calling SG vars.

copy-pr-bot[bot] commented 2 days ago

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.