rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.23k stars 532 forks source link

[FEA] Support for other ways to do MNMG-RF #3539

Open teju85 opened 3 years ago

teju85 commented 3 years ago

Is your feature request related to a problem? Please describe. Current MNMG RF is more like a model-parallel approach. We distribute the data among the workers and also distribute the work of building separate trees on each of them. Each worker then builds a tree based on only the data that is available to it.

Although, this is an embarrassingly parallel approach to build trees in RF. This approach, however, can have some limitations:

  1. does not work well if the dataset is wide (aka lots of features).
  2. tree built on a particular worker may not see samples from other workers, which could introduce bias

Describe the solution you'd like Along with the current approach, we should also be providing an option for users to choose another approach, whose solution is:

  1. If the rows of the dataset are distributed across the workers, then we need to perform an allReduce of the intermediate histograms among those workers, before computing the best split.
  2. If the columns of the dataset are distributed across the workers, then we need to perform a max-allReduce of the individual best-splits among those workers to get the “global” best split.
  3. If both rows and columns are distributed (aka 2D-partitioning of the dataset), then we need to do both 1 and 2.
github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.