Introduce prefetch op: affine -> std -> llvm intrinsic

tensorflow / mlir

"Multi-Level Intermediate Representation" Compiler Infrastructure

1.74k stars 260 forks source link

Introduce prefetch op: affine -> std -> llvm intrinsic #225

Closed bondhugula closed 4 years ago

bondhugula commented 5 years ago

Introduce affine.prefetch: op to prefetch using a multi-dimensional subscript on a memref; similar to affine.load but has no effect on semantics, but only on performance.

Provide lowering through std.prefetch, llvm.prefetch and map to llvm's prefetch instrinsic. All attributes reflected through the lowering - locality hint, rw, and instr/data cache.

Signed-off-by: Uday Bondhugula uday@polymagelabs.com

nicolasvasilache commented 5 years ago

Nice progressive lowering!

bondhugula commented 5 years ago

Any comments here?

bondhugula commented 4 years ago

Thanks and sorry for the wait!

What is the rationale for having an affine version of prefetch? It does not produce any value so there is no need to propagate "affine-ness" constraints through it. If you have an affine.prefetch followed by an affine.load, each of those will have their affine maps expanded separately and potentially CSE'd later, there are not complexity savings here. IMO, you could just have an affine.apply followed by an std.prefetch and avoid introducing a new Op with almost duplicate functionality. Is there a further use case that we are not aware of?

The point of having a prefetch op's affine map with the op is no different than the motivation of keeping an affine load or store's map with the op itself. The idea is that whenever the op moves, we want its affine function to move along with it - it's math sort of atomic with it as opposed to a real operation. It's an unnecessary pain to trace the chain of affine.applys and move those along when the prefetch op moves (for eg. when performing fusion/code motion where the prefetch ops are being moved, its affine.applys would have to move along.). We are completely getting rid of creating affine applys that supply results into affine.load/store from everywhere in Transforms/ (it's not completely done yet @andydavis1 @nicolasvasilache have had this discussion ) - affine applys will only exist in some other cases. As for affine.prefetch ops, they could be inserted by passes as well as domain op-specific optimization recipes (typically just in front of loads but with negative shifted subscripts). So their movement is better treated in a way consistent with affine.load/store for things like loop fusion/distribution, peeling, relative shifting, and pipelining.

bondhugula commented 4 years ago

Thanks for the review; fixed most.

ftynse commented 4 years ago

The point of having a prefetch op's affine map with the op is no different than the motivation of keeping an affine load or store's map with the op itself. The idea is that whenever the op moves, we want its affine function to move along with it - it's math sort of atomic with it as opposed to a real operation. It's an unnecessary pain to trace the chain of affine.applys and move those along when the prefetch op moves (for eg. when performing fusion/code motion where the prefetch ops are being moved, its affine.applys would have to move along.). We are completely getting rid of creating affine applys that supply results into affine.load/store from everywhere in Transforms/ (it's not completely done yet @andydavis1 @nicolasvasilache have had this discussion ) - affine applys will only exist in some other cases. As for affine.prefetch ops, they could be inserted by passes as well as domain op-specific optimization recipes (typically just in front of loads but with negative shifted subscripts). So their movement is better treated in a way consistent with affine.load/store for things like loop fusion/distribution, peeling, relative shifting, and pipelining.

My main concern here is duplication of ops and relevant code. I know the rationale for affine.load/store, I pushed for it. The question is how many other operations would need a similar affine/std duplication? If there are some more coming, we should consider some consolidation scheme that would be able to "lift" standard ops into affine counterparts by adding more features/constraints in the code. Out of scope for this PR, obviously.

bondhugula commented 4 years ago

The point of having a prefetch op's affine map with the op is no different than the motivation of keeping an affine load or store's map with the op itself. The idea is that whenever the op moves, we want its affine function to move along with it - it's math sort of atomic with it as opposed to a real operation. It's an unnecessary pain to trace the chain of affine.applys and move those along when the prefetch op moves (for eg. when performing fusion/code motion where the prefetch ops are being moved, its affine.applys would have to move along.). We are completely getting rid of creating affine applys that supply results into affine.load/store from everywhere in Transforms/ (it's not completely done yet @andydavis1 @nicolasvasilache have had this discussion ) - affine applys will only exist in some other cases. As for affine.prefetch ops, they could be inserted by passes as well as domain op-specific optimization recipes (typically just in front of loads but with negative shifted subscripts). So their movement is better treated in a way consistent with affine.load/store for things like loop fusion/distribution, peeling, relative shifting, and pipelining.

My main concern here is duplication of ops and relevant code. I know the rationale for affine.load/store, I pushed for it. The question is how many other operations would need a similar affine/std duplication? If there are some more coming, we should consider some consolidation scheme that would be able to "lift" standard ops into affine counterparts by adding more features/constraints in the code. Out of scope for this PR, obviously.

I don't have anything else similiar in the pipeline - nor any future ops that'd follow this path. The other pending affine op (affine.grayboxop) is completely different.

ftynse commented 4 years ago

The point of having a prefetch op's affine map with the op is no different than the motivation of keeping an affine load or store's map with the op itself. The idea is that whenever the op moves, we want its affine function to move along with it - it's math sort of atomic with it as opposed to a real operation. It's an unnecessary pain to trace the chain of affine.applys and move those along when the prefetch op moves (for eg. when performing fusion/code motion where the prefetch ops are being moved, its affine.applys would have to move along.). We are completely getting rid of creating affine applys that supply results into affine.load/store from everywhere in Transforms/ (it's not completely done yet @andydavis1 @nicolasvasilache have had this discussion ) - affine applys will only exist in some other cases. As for affine.prefetch ops, they could be inserted by passes as well as domain op-specific optimization recipes (typically just in front of loads but with negative shifted subscripts). So their movement is better treated in a way consistent with affine.load/store for things like loop fusion/distribution, peeling, relative shifting, and pipelining.

My main concern here is duplication of ops and relevant code. I know the rationale for affine.load/store, I pushed for it. The question is how many other operations would need a similar affine/std duplication? If there are some more coming, we should consider some consolidation scheme that would be able to "lift" standard ops into affine counterparts by adding more features/constraints in the code. Out of scope for this PR, obviously.

I don't have anything else similiar in the pipeline - nor any future ops that'd follow this path. The other pending affine op (affine.grayboxop) is completely different.

Works for me, please fix the remaining small comments so we can land.

bondhugula commented 4 years ago

The point of having a prefetch op's affine map with the op is no different than the motivation of keeping an affine load or store's map with the op itself. The idea is that whenever the op moves, we want its affine function to move along with it - it's math sort of atomic with it as opposed to a real operation. It's an unnecessary pain to trace the chain of affine.applys and move those along when the prefetch op moves (for eg. when performing fusion/code motion where the prefetch ops are being moved, its affine.applys would have to move along.). We are completely getting rid of creating affine applys that supply results into affine.load/store from everywhere in Transforms/ (it's not completely done yet @andydavis1 @nicolasvasilache have had this discussion ) - affine applys will only exist in some other cases. As for affine.prefetch ops, they could be inserted by passes as well as domain op-specific optimization recipes (typically just in front of loads but with negative shifted subscripts). So their movement is better treated in a way consistent with affine.load/store for things like loop fusion/distribution, peeling, relative shifting, and pipelining.

My main concern here is duplication of ops and relevant code. I know the rationale for affine.load/store, I pushed for it. The question is how many other operations would need a similar affine/std duplication? If there are some more coming, we should consider some consolidation scheme that would be able to "lift" standard ops into affine counterparts by adding more features/constraints in the code. Out of scope for this PR, obviously.

I don't have anything else similiar in the pipeline - nor any future ops that'd follow this path. The other pending affine op (affine.grayboxop) is completely different.

Works for me, please fix the remaining small comments so we can land.

Done, thanks.

ftynse commented 4 years ago

Sorry for more delay, we are experiencing integration issues with LLVM that cause test breakage. This will be landed as soon as the tests are able to pass again.

ftynse commented 4 years ago

@bondhugula please fix this and let's land

/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp: In member function ‘virtual mlir::PatternMatchResult {anonymous}::PrefetchOpLowering::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::Value*>, mlir::ConversionPatternRewriter&) const’:
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1470:39: error: ‘prefetch’ is not a member of ‘mlir::LLVM’
     rewriter.replaceOpWithNewOp<LLVM::prefetch>(op, dataPtr, isWrite,
                                       ^~~~~~~~
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1470:39: note: suggested alternative: ‘Prefetch’
     rewriter.replaceOpWithNewOp<LLVM::prefetch>(op, dataPtr, isWrite,
                                       ^~~~~~~~
                                       Prefetch
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1470:39: error: ‘prefetch’ is not a member of ‘mlir::LLVM’
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1470:39: note: suggested alternative: ‘Prefetch’
     rewriter.replaceOpWithNewOp<LLVM::prefetch>(op, dataPtr, isWrite,
                                       ^~~~~~~~
                                       Prefetch
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1471:69: error: no matching function for call to ‘mlir::ConversionPatternRewriter::replaceOpWithNewOp<<expression error> >(mlir::Operation*&, mlir::Value*&, mlir::LLVM::ConstantOp&, mlir::LLVM::ConstantOp&, mlir::LLVM::ConstantOp&)’
                                                 localityHint, isData);

bondhugula commented 4 years ago

@bondhugula please fix this and let's land

/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp: In member function ‘virtual mlir::PatternMatchResult {anonymous}::PrefetchOpLowering::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::Value*>, mlir::ConversionPatternRewriter&) const’:
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1470:39: error: ‘prefetch’ is not a member of ‘mlir::LLVM’
     rewriter.replaceOpWithNewOp<LLVM::prefetch>(op, dataPtr, isWrite,
                                       ^~~~~~~~
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1470:39: note: suggested alternative: ‘Prefetch’
     rewriter.replaceOpWithNewOp<LLVM::prefetch>(op, dataPtr, isWrite,
                                       ^~~~~~~~
                                       Prefetch
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1470:39: error: ‘prefetch’ is not a member of ‘mlir::LLVM’
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1470:39: note: suggested alternative: ‘Prefetch’
     rewriter.replaceOpWithNewOp<LLVM::prefetch>(op, dataPtr, isWrite,
                                       ^~~~~~~~
                                       Prefetch
/usr/local/google/home/zinenko/llvm-project/llvm/projects/mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp:1471:69: error: no matching function for call to ‘mlir::ConversionPatternRewriter::replaceOpWithNewOp<<expression error> >(mlir::Operation*&, mlir::Value*&, mlir::LLVM::ConstantOp&, mlir::LLVM::ConstantOp&, mlir::LLVM::ConstantOp&)’
                                                 localityHint, isData);

Done. (Also resolved other conflicts.). The trunk does have an unrelated build issue.

bondhugula commented 4 years ago

Missing coverage in a few places, please add tests in a followup.

Thanks for pointing this out. Missed test cases for these since this was the same exact pattern copied from load/store - sort of trivial. Can add them in a follow up.