google / ml-compiler-opt

Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.
Apache License 2.0
636 stars 93 forks source link

ppo_nn_agent.gin hyperparam tuning #17

Open amirjamez opened 2 years ago

amirjamez commented 2 years ago

Hi @yundiqian. I was skimming through the hyperparams of https://github.com/google/ml-compiler-opt/blob/main/compiler_opt/rl/inlining/gin_configs/ppo_nn_agent.gin and it seems counterintuitive to me that both PPOAgent.normalize_rewards and PPOAgent.normalize_observations are assigned as False. Would you be able to provide some info on it? Looking at the TF codebase (https://github.com/tensorflow/agents/blob/master/tf_agents/agents/ppo/ppo_agent.py#L206)), it is advised to normalize rewards and observation, so I was wondering if you had tried these out before?

Thanks! -Amir

yundiqian commented 2 years ago

This is a great question! Yes, normalization helps, but we turned it off because we do the normalization ourselves so we don't rely on the normalization in the TF-Agents for that, i.e., the input to the Agent is already 'normalized' to a reasonable value range

However, I tuned the parameters a long time ago, so I'm not sure about the details when I tune this parameter. You may try to tune it to see, and let us know if you find it being helpful!

amirjamez commented 2 years ago

I see. So that's basically the job of bucketization. Sure, I can give it a try and update this thread.