tkestack / vcuda-controller

Other
488 stars 156 forks source link

[Question] Why the `blocks` not actually used in `rate_limiter` ? #14

Closed lining2020x closed 3 years ago

lining2020x commented 3 years ago

Hi, I am diving in the vcuda library codes recently. Some details in the code are a bit difficult to understand. https://github.com/tkestack/vcuda-controller/blob/3bf3a8b983a7a4720ef8c5f5e98fdc6ec2b52649/src/hijack_call.c#L162

Why the blocks not actually used in rate_limiter, it seems not used actually used, just for a printing ?

static void rate_limiter(int grids, int blocks) {
  int before_cuda_cores = 0;
  int after_cuda_cores = 0;
  int kernel_size = grids;

  LOGGER(5, "grid: %d, blocks: %d", grids, blocks);
  LOGGER(5, "launch kernel %d, curr core: %d", kernel_size, g_cur_cuda_cores);
  if (g_vcuda_config.enable) {
    do {
CHECK:
      before_cuda_cores = g_cur_cuda_cores;
      LOGGER(8, "current core: %d", g_cur_cuda_cores);
      if (before_cuda_cores < 0) {
        nanosleep(&g_cycle, NULL);
        goto CHECK;
      }
      after_cuda_cores = before_cuda_cores - kernel_size;
    } while (!CAS(&g_cur_cuda_cores, before_cuda_cores, after_cuda_cores));
  }
}
mYmNeo commented 3 years ago

Grid and block are two factors which could affect utilization. But the factor of block doesn't affect utilization as much as grid, we leave this factor in function prototype for others which want to do some custom control algorithm

lining2020x commented 3 years ago

OK, I see. Many thanks.

lining2020x commented 3 years ago

Hi @mYmNeo , I still have some other questions. Why does it need to be multiplied by a FACTOR here? https://github.com/tkestack/vcuda-controller/blob/3bf3a8b983a7a4720ef8c5f5e98fdc6ec2b52649/src/hijack_call.c#L486 And why the FACTOR is 32? https://github.com/tkestack/vcuda-controller/blob/3bf3a8b983a7a4720ef8c5f5e98fdc6ec2b52649/include/hijack.h#L100

lining2020x commented 3 years ago

Why is the increment calculated like this?How did the value of 2560 come from?Not very understand the algorithm here. https://github.com/tkestack/vcuda-controller/blob/3bf3a8b983a7a4720ef8c5f5e98fdc6ec2b52649/src/hijack_call.c#L186-L187

int delta(int up_limit, int user_current, int share) {
  int utilization_diff =
      abs(up_limit - user_current) < 5 ? 5 : abs(up_limit - user_current);
  int increment =
      g_sm_num * g_sm_num * g_max_thread_per_sm * utilization_diff / 2560;
  /* Accelerate cuda cores allocation when utilization vary widely */
  if (utilization_diff > up_limit / 2) {
    increment = increment * utilization_diff * 2 / (up_limit + 1);
  }

  if (user_current <= up_limit) {
    share = share + increment > g_total_cuda_cores ? g_total_cuda_cores
                                                   : share + increment;
  } else {
    share = share - increment < 0 ? 0 : share - increment;
  }

  return share;
}
mYmNeo commented 3 years ago

These are some experience magic number.

luckyJ-nj commented 3 years ago

嗨@mYmNeo,我还有一些其他问题。 为什么这里需要乘以a FACTORhttps://github.com/tkestack/vcuda-controller/blob/3bf3a8b983a7a4720ef8c5f5e98fdc6ec2b52649/src/hijack_call.c#L486

为什么FACTOR是 32? https://github.com/tkestack/vcuda-controller/blob/3bf3a8b983a7a4720ef8c5f5e98fdc6ec2b52649/include/hijack.h#L100 I guess the wrap size of a general GPU is 32