yukinko-iwasaki / CS744_CourseProject

0 stars 0 forks source link

Prioritize the tuning parameters #1

Open yukinko-iwasaki opened 3 years ago

yukinko-iwasaki commented 3 years ago

From our proposal feedback, it was pointed out that some of the parameters mentioned were not directly related to the parallelism. Let's decide which parameters which we will tune for our experiments and in what order.

  1. The relationship between the dimensionality of the dataset and the number of machines (fixed row counts)
    • What is the optimal number of machines for different dimensionalities? Here, we look at the execution time.
  2. The relationship between the bin size and the accuracy and the execution time (fixed row counts and column counts)
    • What is the optimal size for a bin which realizes the acceptable accuracy with the optimal execution time. => let's define what is an acceptable accuracy

In our proposal, we also mentioned depth of the tree and the max number of features as tuning parameters. However, theses are not related to parallelism directly, let's skip these for now.