Add TTI pass initialization to pass managers.

dcaballe commented 5 years ago

Hi!

This PR is mostly to ask for feedback :). We are using mlir::ExecutionEngine to JIT-compile and execute nGraph ops on CPUs and noticed that LLVM optimizer wasn't getting the proper target information because TTI implementation was initialized with NoTTIImpl. This happens because TargetTransformInfoWrapperPass is not added to the pass managers.

I think there are a couple of ways to fix this: 1) MLIR repo: something like the code attached, which adds TargetTransformInfoWrapperPass initialized with the proper TargetMachine to the pass managers. I tested it by passing a TargetMachine with all the default host sub-target features and specific cpu name (skylake-avx512). However, I must be missing something since I’m getting skylake-avx512 assembly with this approach even though the target machine used by OrcJit doesn’t have any sub-target features/cpu name set: https://github.com/tensorflow/mlir/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp#L150 2) nGraph repo: using mlir::makeLLVMPassesTransformer (https://github.com/tensorflow/mlir/blob/master/lib/ExecutionEngine/OptUtils.cpp#L103) in nGraph to add TargetTransformInfoWrapperPass to the pass managers. 3) Any other idea?

I see value in adding TargetTransformInfoWrapperPass pass in MLIR repo so that other users don’t hit the same issue or have to replicate the same code. I also see value in having a utility in MLIR repo that creates a TargetMachine with the default host cpu parameters (cpu name, sub-target features, etc.) and reusing it every time is needed (for example, in the OrcJIT code I pointed before and also here: https://github.com/tensorflow/mlir/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp#L150)

Please, let me know what you think. I should be able to help with this.

Thanks! Diego

joker-eph commented 5 years ago

Thanks! I had a similar patch locally that I didn't complete.

Ultimately my take was that most of the code here could be moved directly in LLVM as there isn't much specific to MLIR in the JIT abstraction we have in MLIR. In the meantime this PR seems like a good thing to do.

dcaballe commented 5 years ago

Ultimately my take was that most of the code here could be moved directly in LLVM as there isn't much specific to MLIR in the JIT abstraction we have in MLIR.

Yeah, that would be great.

I also see value in having a utility in MLIR repo that creates a TargetMachine with the default host cpu parameters (cpu name, sub-target features, etc.)

Do you think adding a utility for this ^ in a separate commit is also a good idea?

This is a suggestion for the squashed commit message. Sorry, I didn't add it to the first commit: Add TargetMachine parameter to makeOptimizingTransformer and populatePassManagers APIs and initialize TargetTransformInfoWrapperPass with the right target information and add it to the pass managers.

Thanks, Mehdi! Diego

dcaballe commented 5 years ago

I also see value in having a utility in MLIR repo that creates a TargetMachine with the default host cpu parameters (cpu name, sub-target features, etc.)

Do you think adding a utility for this ^ in a separate commit is also a good idea?

https://github.com/dcaballe/mlir/pull/1 is a stepping stone into ^. Please, let me know what you think.

Thanks! Diego

joker-eph commented 5 years ago

You have conflicts now apparently, can you rebase?

dcaballe#1 is a stepping stone into ^. Please, let me know what you think.

Thanks, it seems a bit strange to me to manually unwrap a TM to build a TM-builder, I'm not sure why ORCJIT does not just take a std::function<std::unique_ptr<TargetMachine>()> callback instead of this class? Most of the code you're adding is putting this whole utility even more in LLVM land, at this point I rather invest time into moving this into LLVM itself than adding so much "low-level" glue code in MLIR.

dcaballe commented 5 years ago

You have conflicts now apparently, can you rebase?

Done

I'm not sure why ORCJIT does not just take a std::function<std::unique_ptr()> callback instead of this class?

Yeah, it would make more sense since TM seems to be needed before ORCJIT is created. There is sort of a chicken-egg problem as we need TM to properly initialize TTI pass in the transformer function that we pass to create ORCJIT.

at this point I rather invest time into moving this into LLVM itself than adding so much "low-level" glue code in MLIR.

Sure. Please, keep me posted and let me know if I can help.

Diego

dcaballe commented 5 years ago

Hi Mehdi,

Do you think we could proceed with this PR in the meantime until things are moved to LLVM? That would be very helpful.

Thanks, Diego

joker-eph commented 5 years ago

Sorry, I didn't mean to stale the current PR of course. If you don't mind updating it with the two nit I just commented, that seems OK to merge.

tensorflow / mlir

Add TTI pass initialization to pass managers. #49