google / ml-compiler-opt

Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.
Apache License 2.0
612 stars 92 forks source link

【Question】Why use llvm-size to calculate rewards? llvm also calculates size rewards? #319

Closed 18liumin closed 9 months ago

18liumin commented 9 months ago

void recordInliningImpl() override { MLInlineAdvice::recordInliningImpl(); getAdvisor()->resetNativeSize(Caller); int Reward = std::numeric_limits::max(); if (InlineSizeEstimatorAnalysis::isEvaluatorRequested() && !getAdvisor()->isForcedToStop()) { int NativeSizeAfter = getAdvisor()->getNativeSizeEstimate(Caller) + CalleeSizeEstimateBefore; Reward = NativeSizeAfter - (CallerSizeEstimateBefore + CalleeSizeEstimateBefore); getAdvisor()->updateNativeSizeEstimate(Reward); } log(Reward, /Success=*/true); }

////////////////////////////////////////////////////////////////////////

cmdline = [self._llvm_size_path, output_native_path]
output_bytes = compilation_runner.start_cancellable_process(
    cmdline,
    timeout=self._compilation_timeout,
    cancellation_manager=self._cancellation_manager,
    want_output=True)
if not output_bytes:
  raise RuntimeError(f'Empty llvm-size output: {" ".join(cmdline)}')
output = output_bytes.decode('utf-8')
tmp = output.split('\n')
if len(tmp) != 3:
  raise RuntimeError(f'Wrong llvm-size output {output}')
tmp = tmp[1].split('\t')
native_size = int(tmp[0])

if native_size == 0:
  return {}

if reward_only:
  return {_DEFAULT_IDENTIFIER: (None, native_size)}

result = log_reader.read_log_as_sequence_examples(log_path)
if len(result) != 1:
  return {}

sequence_example = next(iter(result.values()))

if not sequence_example.HasField('feature_lists'):
  return {}

return {_DEFAULT_IDENTIFIER: (sequence_example, native_size)}
boomanaiden154 commented 9 months ago

The native size reward estimation is driven by a ML model (IR2Native) and thus isn't exactly ground truth data. Mircea can probably speak more to the exact reasons for switching over, but ground truth data (or something quite close to it) is almost always going to be better, assuming the training algorithm is able to work with it, which was the original concern that led to the IR2Native model.

Eventually the IR2Native model in upstream LLVM is going to get removed as it's not used anywhere, but we're currently holding off as are looking to use it for a comparison study against some new techniques.

mtrofin commented 9 months ago

Basically what @boomanaiden154 said. In addition, we used the IR2Size model initially to train with algorithms like DQN, that want dense rewards - i.e. a reward after each action (each inlined callsite). It turned out that didn't work as well as the final reward training methods (at least the way we tried it)