Closed TIHan closed 1 month ago
InteractiveClang
is just a wrapper around the interactive MLGO tooling that has been upstreamed into LLVM, allowing for unix pipes to be used to get data into and out of the compiler. @jacob-hegna would have more context for that, but isn't actively working on the project anymore. I don't believe the environments are currently used for the training scripts that are within this repository and it was designed to support some other use cases.512
from what I remember) and then the batch is used to perform an iteration of training. The details of the training are in https://arxiv.org/abs/2101.04808, but some of the specific hyperparameters have changed, but those are all available in the gin config files.That's at least how I recall things. Hopefully I'm not misremembering anything as there can be some subtle but impactful differences here. @mtrofin will see this when he's back from vacation and correct me if I'm incorrect anywhere.
@boomanaiden154 Thank you for your informative answers! This is very helpful.
@boomanaiden154 just a follow-up question: So my understanding is that, during training, Clang loads a policy from disk and uses it to make an inline decision. Do the results from that get immediately fed back into the model and then the policy gets updated on disk?
During training, the log of the decisions that clang made using the policy loaded from disk are logged as a trajectory. The logs are then ingested by the tooling in this repository and used to train the model. We do a data collection phase where we compile a bunch of modules (512 is the default for regalloc), log all the trajectories, ingest them into the tooling here, do some training, and then iterate again.
That makes sense to me. While the model is being trained, does it get written back to disk to be used for subsequent decisions made in clang?
The training process is basically 2-phased, in the non-InteractiveClang
case: (1) the current model is given to clang, clang uses it without mutating it, and we make whatever observations; this happens for N modules in parallel; (2) the resulting N traces are fed to e.g. PPO or Reinforce (or, for Evolutionary Strategy, just the reward is fed to an ES algo) and we get the new model, and repeat.
If you use InteractiveClang
you can do whatever you want every time clang asks for an advice.
Thank you @mtrofin and @boomanaiden154 for your helpful answers. A lot of this is starting to make sense.
I have another question. I notice the output (or "inlining_decision") is a single scalar value (shape(1)) for the "saved_policy". I'm not sure how it got transformed into that since tf.keras.layers.Concatenate()
was used for the preprocessing_combiner
. The only way I know how to combine to a single scalar output is using tf.keras.layers.Add()
, which I do not think you all are. Do you all know what makes the policy have a single scalar output even when using tf.keras.layers.Concatenate()
?
tf.keras.layers.Add
does point-wise tensor addition and doesn't accumulate anything, so it doesn't actually combine anything into a single scalar. For the inlining policy specifically, the output spec is set here: https://github.com/google/ml-compiler-opt/blob/6ff14fd3092eeaec6d614169bc96a7b55c9d383a/compiler_opt/rl/inlining/config.py#L73
The network is just a tf_agents.networks.actor_distribution_network.ActorDistributionNetwork
with some feed-forward layers. If everything is working how I assume, the output should just be a vector-vector multiplication of the last layer outputs and a weight vector at the end (and maybe some bias addition). I'd have to look into the details for how exactly the post-processing gets done however to confirm everything.
Does expanding the dims in a Lambda layer have any impact, such as https://github.com/google/ml-compiler-opt/blob/main/compiler_opt/rl/feature_ops.py#L70 ? It's also done for identity and discard.
It shouldn't impact the output shape. The line you linked also calls tf.concat
afterwards, so the input should end up being the same shape, just normalized.
You should be able to hook up a debugger (or even just do printf style debugging) to figure out how the tensors flow through the tooling. You'll probably hit the tracing stage rather than something at runtime due to how Tensorflow will construct the graph and then execute that rather than the python code, but that will give you a decent idea of the overall shape.
The tf.expand_dims(obs, -1)
in preprocessing is what allows the NN to use Concatenate
as the preprocessor combiner, I verified this. I'm actually not sure why though :).
This isn't an issue. I wasn't sure where to ask these questions, so I posted them as an issue if that's ok.
I apologize in advance if these seem very obvious. (new to ML) I have read through the MLGO paper as well.
MLGO Questions: