rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.21k stars 530 forks source link

[Tech Debt] Improve build time #5060

Open wphicks opened 1 year ago

wphicks commented 1 year ago

Summary

It would be useful to improve build time as much as possible for faster developer iteration and reduced CI resource consumption.

Current bottleneck files

The following are roughly in order of priority. Note that this does not correspond strictly to the longest build times (although it mostly does). Instead, it takes into account what can and can't be parallelized and ensures that we're tackling each dependent branch.

wphicks commented 1 year ago

Note, this issue replaces the original #3501 since there has been a fair amount of movement since then on this problem.

cjnolet commented 1 year ago

Some of these things, especially those things which require pairwise distances, will likely benefit from using the pre-compiled specializations that are already being built in RAFT. We've scraped through most things and updated them but there are still some things lingering, unfortunately, which might need to be updated (thinking single linkage, trustworthiness, silhouetter_score, hdbscan)

wphicks commented 1 year ago

Just submitted #5061 to get the low-hanging fruit, but silhouette score remains our most significant bottleneck.