Custom lowering of MemRefs to LLVM

dcaballe commented 4 years ago

This PR introduces the infrastructure to provide a custom Std-to-LLVM lowering for MemRef type:

MemRefDescriptor class is turned into an abstract API that defines the methods needed to perform the custom lowering.
Existing MemRefDesriptor class is renamed to DefaultMemRefDescriptor. It provides default struct lowering implementation (NFC).
TestCustomMemRefLLVMLowering.cpp implements a custom MemRef descriptor lowering and LLVM type converter with the basic functionality to lower MemRef type to a plain pointer to element type.
convert-memref-ops.mlir is split into convert-static-memref-ops.mlir and convert-dynamic-memref-ops.mlir so that TestCustomMemRefLLVMLowering.cpp can be tested on all the static MemRef tests available.

Related discussion: https://github.com/tensorflow/mlir/issues/309

ftynse commented 4 years ago

Wouldn't it be simpler to just provide a set of patterns that lower memref-related ops differently, here's a list https://github.com/llvm/llvm-project/blob/master/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp#L2126

dcaballe commented 4 years ago

Wouldn't it be simpler to just provide a set of patterns that lower memref-related ops differently, here's a list https://github.com/llvm/llvm-project/blob/master/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp#L2126

The problem with that approach is that it would require re-implementing half of LLVM lowering for every custom memref lowering. I see that being a problem in terms of code duplication and maintainability (every custom LLVM lowering will break every time something changes in a memref-related op or new ops are added). Basically, every custom lowering will have to be catching up with the default lowering every time that MLIR repo is moved forward in external projects. That would be really concerning. Also, based on the current implementation, I see that memref lowering is just a step in the type conversion mechanism. It's decoupled enough from the remaining part of the op lowering so that we can provide an independent abstraction for that. We shouldn't need to replicate the remaining part of the op lowering.

As of the current implementation, we could also pass MemRefDescriptor as a template instead of using an abstract base class. I think that would be cleaner and it wouldn't require dynamic allocation for memref descriptors.

ftynse commented 4 years ago

It's a typical trade-off. If there is a new operation added to dialect X, somebody will have to implement conversions from it to other dialects. Whether it should be the owner of dialect X, or the owners of other dialects is up for negotiation between those owners. If the other dialects are not upstream, it's unlikely that the owner of dialect X will be able to do anything about it. (that being said, the conversion will not necessarily break, but may not support the newly added operation).

So it feels like your main concern is about the lowering being out-of-tree. If we did have an API for the descriptor, nothing would prevent the owner of the in-tree lowering from changing that API and still breaking external projects that rely on it. And it would be a compilation error, not just a properly reported "could not lower" runtime error.

bondhugula commented 4 years ago

@dcaballe Normally, it'd be great to just have a single good lowering for memrefs for the LLVM dialect. I assume what you need out of tree is something different enough that the existing lowering can't be easily extended/augmented to handle? (Looks like you require a custom memref descriptor that carries more information.) It would be good to just find a way to augment the existing lowering so that the benefits are available to all paths. OTOH, if there are use cases for custom / different enough lowerings to exist separately (all still targeting the same LLVM dialect in MLIR), it'd be good to understand at least one (or better two) such use cases to be able to do the right refactoring or to decide if it'd be better off to just go with different pattern rewrites as @ftynse suggests (keeping the code isolated albeit with duplication). Also, as @ftynse points out, even with a common API, breakage for external lowerings would still happen but the changes needed to make it working again are reduced.

dcaballe commented 4 years ago

Thanks for the replies and sorry for the delay (coming back from break). I'll also reply here to comments in https://github.com/tensorflow/mlir/issues/309 to unify the discussion.

I assume what you need out of tree is something different enough that the existing lowering can't be easily extended/augmented to handle? (Looks like you require a custom memref descriptor that carries more information.)

Actually less information. I need to lower memrefs to bare pointers to be able to properly pass the noalias attribute to LLVM for function arguments until we have a model for aliasing in MLIR. It's similar to what we had before introducing support for strided memrefs and alignment. It's sad that we didn't have a test for that since this is impacting performance for everybody going to LLVM. Any plans on introducing performance tests using LLVM?

Also, more people in the mailing list mentioned that they would like to have this lowering for other purposes.

This would shift the burden on whoever updates the core, not necessarily reduce it. But I understand the concern of getting out of sync.

IMO, it's usually easier to incrementally keep a feature working than to break it and try to fix it later. Actually, the issue with the noalias attribute is a good example to illustrate how complicate the latter can be :).

So I'm fine with upstreaming an additional lowering for memrefs to go to bare pointers when possible and failing otherwise. IMO, this can still be implemented as separate patterns producing simple code, sharing implementation helpers if reasonable. The choice between lowerings can be controlled by an option in the pass, similarly to what we do for malloc/alloca.

This sounds fair enough and will allow others to use it so I'll proceed with it if no objections. I'll move the discussion to Phabricator then.

Thanks, Diego

tensorflow / mlir

Custom lowering of MemRefs to LLVM #337