Closed dcaballe closed 5 years ago
Thanks! I had a similar patch locally that I didn't complete.
Ultimately my take was that most of the code here could be moved directly in LLVM as there isn't much specific to MLIR in the JIT abstraction we have in MLIR. In the meantime this PR seems like a good thing to do.
Ultimately my take was that most of the code here could be moved directly in LLVM as there isn't much specific to MLIR in the JIT abstraction we have in MLIR.
Yeah, that would be great.
I also see value in having a utility in MLIR repo that creates a TargetMachine with the default host cpu parameters (cpu name, sub-target features, etc.)
Do you think adding a utility for this ^ in a separate commit is also a good idea?
This is a suggestion for the squashed commit message. Sorry, I didn't add it to the first commit:
Add TargetMachine parameter to makeOptimizingTransformer and populatePassManagers APIs and initialize TargetTransformInfoWrapperPass
with the right target information and add it to the pass managers.
Thanks, Mehdi! Diego
I also see value in having a utility in MLIR repo that creates a TargetMachine with the default host cpu parameters (cpu name, sub-target features, etc.)
Do you think adding a utility for this ^ in a separate commit is also a good idea?
https://github.com/dcaballe/mlir/pull/1 is a stepping stone into ^. Please, let me know what you think.
Thanks! Diego
You have conflicts now apparently, can you rebase?
dcaballe#1 is a stepping stone into ^. Please, let me know what you think.
Thanks, it seems a bit strange to me to manually unwrap a TM to build a TM-builder, I'm not sure why ORCJIT does not just take a std::function<std::unique_ptr<TargetMachine>()>
callback instead of this class?
Most of the code you're adding is putting this whole utility even more in LLVM land, at this point I rather invest time into moving this into LLVM itself than adding so much "low-level" glue code in MLIR.
You have conflicts now apparently, can you rebase?
Done
I'm not sure why ORCJIT does not just take a std::function<std::unique_ptr
()> callback instead of this class?
Yeah, it would make more sense since TM seems to be needed before ORCJIT is created. There is sort of a chicken-egg problem as we need TM to properly initialize TTI pass in the transformer
function that we pass to create ORCJIT.
at this point I rather invest time into moving this into LLVM itself than adding so much "low-level" glue code in MLIR.
Sure. Please, keep me posted and let me know if I can help.
Diego
Hi Mehdi,
Do you think we could proceed with this PR in the meantime until things are moved to LLVM? That would be very helpful.
Thanks, Diego
Sorry, I didn't mean to stale the current PR of course. If you don't mind updating it with the two nit I just commented, that seems OK to merge.
Hi!
This PR is mostly to ask for feedback :). We are using
mlir::ExecutionEngine
to JIT-compile and execute nGraph ops on CPUs and noticed that LLVM optimizer wasn't getting the proper target information because TTI implementation was initialized withNoTTIImpl
. This happens becauseTargetTransformInfoWrapperPass
is not added to the pass managers.I think there are a couple of ways to fix this: 1) MLIR repo: something like the code attached, which adds
TargetTransformInfoWrapperPass
initialized with the properTargetMachine
to the pass managers. I tested it by passing aTargetMachine
with all the default host sub-target features and specific cpu name (skylake-avx512). However, I must be missing something since I’m getting skylake-avx512 assembly with this approach even though the target machine used by OrcJit doesn’t have any sub-target features/cpu name set: https://github.com/tensorflow/mlir/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp#L150 2) nGraph repo: usingmlir::makeLLVMPassesTransformer
(https://github.com/tensorflow/mlir/blob/master/lib/ExecutionEngine/OptUtils.cpp#L103) in nGraph to addTargetTransformInfoWrapperPass
to the pass managers. 3) Any other idea?I see value in adding
TargetTransformInfoWrapperPass
pass in MLIR repo so that other users don’t hit the same issue or have to replicate the same code. I also see value in having a utility in MLIR repo that creates aTargetMachine
with the default host cpu parameters (cpu name, sub-target features, etc.) and reusing it every time is needed (for example, in the OrcJIT code I pointed before and also here: https://github.com/tensorflow/mlir/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp#L150)Please, let me know what you think. I should be able to help with this.
Thanks! Diego