[mlir] Do not generate sizeof expressions using GEP

nikic commented 4 months ago

Lowering MemRef to LLVM dialect currently generates ptrtoint (getelementptr null, 1) style sizeof expressions using this helper: https://github.com/llvm/llvm-project/blob/cfb4dce90ce66acbd094e443f422c4e5c413a9e8/mlir/lib/Conversion/LLVMCommon/Pattern.cpp#L171-L184

These need to be replaced with an emission of a constant based on the DataLayout.

The ability to generate the expressions in this form will go away entirely with https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699 and causes problems for https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179 as well.

llvmbot commented 4 months ago

@llvm/issue-subscribers-mlir-llvm

Author: Nikita Popov (nikic)

Lowering MemRef to LLVM dialect currently generates `ptrtoint (getelementptr null, 1)` style sizeof expressions using this helper: https://github.com/llvm/llvm-project/blob/cfb4dce90ce66acbd094e443f422c4e5c413a9e8/mlir/lib/Conversion/LLVMCommon/Pattern.cpp#L171-L184 These need to be replaced with an emission of a constant based on the DataLayout. The ability to generate the expressions in this form will go away entirely with https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699 and causes problems for https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179 as well.

krzysz00 commented 4 months ago

A potential problem is that aot of uses of ConvertToLLVM won't have the data layout available at the time those conversion patterns are running

That's probably something that needs to be fixed long-term

Perhaps a llvm.mlir.sizeof that gets translated at module translation time would work as a quick fix

krzysz00 commented 4 months ago

@rengolin and @fabianmcg and maybe some other people since this is related to targets/codegen/... and is a useful concrete thing for a new system to do

fabianmcg commented 4 months ago

@nikic do you have a timeline as to when are they are going away and when ptradd is landing? I'm adding a solution to that issue on MLIR, however, I don't have an ETA as to when it will land -It's in review, but it's a +6k loc code change.

So, depending on your ETA a quicker solution might be required.

If I recall correctly it's always possible to query the data layout, but it might not correspond to actual expected size if the target has not been determined.

rengolin commented 4 months ago

That's an interesting take. The main issue is that, unlike LLVM IR, MLIR does not mandate data layout to exist in IR. But if it exists, I think it would be a nice way to fixing the problem.

Other issues:

DLTI allows listing more than one target (in the TI section) but so far we only have examples of one data layout (in the DL section). We may have to move/add data layout to the target description or at least require some hierarchy that can be queried in a generic way.
Data layout only talks about scalars (including pointers), not aggregates and their ABI requirements. C++ has a set of structure layout ABIs (Itanium, Arm) that are not trivial to encode and would need to be fully encoded in the GEP lowering.

Having said that, I believe Clang will have to deal with that problem in CIL, so perhaps there's an interest in involving them in this discussion?

@AaronBallman

fabianmcg commented 4 months ago

DLTI allows listing more than one target (in the TI section) but so far we only have examples of one data layout (in the DL section). We may have to move/add data layout to the target description or at least require some hierarchy that can be queried in a generic way.

If we are at a stage with LLVM in the IR, then I think we can assume/force only having a single layout and a single target. For example, it would be odd to have rocdl and nvvm ops at that stage.

Data layout only talks about scalars (including pointers), not aggregates and their ABI requirements. C++ has a set of structure layout ABIs (Itanium, Arm) that are not trivial to encode and would need to be fully encoded in the GEP lowering.

If llvm is thinking to derive those constants from the DL, then we should be able to do it too, we only need to require a compilation target before lowering to LLVM.

I believe Clang will have to deal with that problem in CIL, so perhaps there's an interest in involving them in this discussion?

That code would likely live in clang. However, if we shoot for the moon we could finally be able to have C++ calling conventions on MLIR (and for that matter for all programming languages using MLIR). But in the best-case scenario, that's going to take years.

nikic commented 4 months ago

A potential problem is that aot of uses of ConvertToLLVM won't have the data layout available at the time those conversion patterns are running

That's probably something that needs to be fixed long-term

I'm not at all familiar with MLIR -- my expectation would have been that as LLVM IR always requires a DataLayout (though it may just be the implicit default data layout that is used for tests) that the MLIR LLVM IR dialect would mirror that. Is that not architecturally possible with MLIR?

@nikic do you have a timeline as to when are they are going away and when ptradd is landing? I'm adding a solution to that issue on MLIR, however, I don't have an ETA as to when it will land -It's in review, but it's a +6k loc code change.

This is not urgent. This is more of a "this will break in a few LLVM releases" thing than "this will break in a few days".

AaronBallman commented 4 months ago

That's an interesting take. The main issue is that, unlike LLVM IR, MLIR does not mandate data layout to exist in IR. But if it exists, I think it would be a nice way to fixing the problem.

Other issues:
* DLTI allows listing more than one target (in the TI section) but so far we only have examples of one data layout (in the DL section). We may have to move/add data layout to the target description or at least require some hierarchy that can be queried in a generic way.

* Data layout only talks about scalars (including pointers), not aggregates and their ABI requirements. C++ has a set of structure layout ABIs (Itanium, Arm) that are not trivial to encode and would need to be fully encoded in the GEP lowering.
Having said that, I believe Clang will have to deal with that problem in CIL, so perhaps there's an interest in involving them in this discussion?

@AaronBallman

CC @bcardosolopes @lanza for CIR awareness, and @rjmccall @efriedma-quic for Clang codegen awareness.

nikic commented 4 months ago

Data layout only talks about scalars (including pointers), not aggregates and their ABI requirements. C++ has a set of structure layout ABIs (Itanium, Arm) that are not trivial to encode and would need to be fully encoded in the GEP lowering.

The way this works currently is that there is a complex contract between the frontend and LLVM, where the frontend performs certain contortions to produce types that LLVM will then hopefully interpret in the way the frontend intended and produce correct sizes and offsets from it. The model we're slowly moving towards is that all the layout logic is handled exclusively by the frontend and LLVM is only told the computed quantities, so that both sides can not go out of sync. (Where "frontend" in this context may be MLIR/CIL rather than Clang proper.)

fabianmcg commented 4 months ago

Is that not architecturally possible with MLIR?

It's possible to require it. It's just that it's not a requirement and in consequence many users set that information until translation to llvm IR. But with this proposed change we would need to always require it (we already should be doing it).

zero9178 commented 4 months ago

Data layout only talks about scalars (including pointers), not aggregates and their ABI requirements. C++ has a set of structure layout ABIs (Itanium, Arm) that are not trivial to encode and would need to be fully encoded in the GEP lowering.

FWIW we do have some logic and precedence for this already as we can calculate the sizes of LLVM dialect structs: https://github.com/llvm/llvm-project/blob/main/mlir%2Flib%2FDialect%2FLLVMIR%2FIR%2FLLVMTypes.cpp#L511

The issues that you mentioned are complexity in the lowering to LLVM IR, not within LLVM IR, and whose status quo of support (every lowering needing to do this themselves) do not change with the introduction of ptradd.

The way I see it the difficulty here is really transitioning to requiring a data layout during conversion to LLVM

fabianmcg commented 4 months ago

The way I see it the difficulty here is really transitioning to requiring a data layout during conversion to LLVM

Isn't that as simple as failing convert-to-llvm if there's no target or data layout present in the module?

Edit: I think I misunderstood your comment, you are referring to the amount of work required to be in compliance with this upstream, right @zero9178?

zero9178 commented 4 months ago

The way I see it the difficulty here is really transitioning to requiring a data layout during conversion to LLVM

Isn't that as simple as failing convert-to-llvm if there's no target or data layout present in the module?

Edit: I think I misunderstood your comment, you are referring to the amount of work required to be in compliance with this upstream, right @zero9178?

Roughly yes :) As in, the code changes are easy obviously, but getting people on board and changing downstream requires more coordination and guidance which is the complex part

fabianmcg commented 4 months ago

but getting people on board and changing downstream requires more coordination and guidance which is the complex part

If llvm changes then we and downstream will have to adapt.

FWIW, if/when the ptr dialect lands. I have a type_offset op https://github.com/llvm/llvm-project/blob/ed3de95c2fc823f703f8392e4839e9276bd47d4f/mlir/include/mlir/Dialect/Ptr/IR/PtrOps.td#L386-L409 that would simplify this issue.

rjmccall commented 4 months ago

Clang does not have to deal with this issue because C and C++ define it away. The result of sizeof is an integer constant that does not maintain any structural relationship to the type. Portability in C and C++ relies on processing the translation unit multiple times.

I don't know this language you're starting from, so it's quite possible you have different constraints.

krzysz00 commented 4 months ago

Re https://github.com/llvm/llvm-project/blob/main/mlir%2Flib%2FDialect%2FLLVMIR%2FIR%2FLLVMTypes.cpp#L511, the problem with that is that that dataLayout is very often constructed from DataLayout() (the default one) which, for example, won't give you the sizes of pointers

rengolin commented 4 months ago

Is that not architecturally possible with MLIR?

It's possible to require it. It's just that it's not a requirement and in consequence many users set that information until translation to llvm IR. But with this proposed change we would need to always require it (we already should be doing it).

This is not always the case.

From front-ends like Clang and Flang, providing target descriptions and data layouts is not just trivial, but required. I agree with you. But most other things do not have that level of understanding of their targets, or don't want to fix up a target straight away or even will end up with more than one target in the same IR (which will then later be split, etc).

I think it's ok to require Clang/Flang to emit DLTI info and have some passes to require those to exist. It's not ok to make it a requirement to MLIR in general, at least not on its current state.

zero9178 commented 4 months ago

What might be easier to require is to make the LLVM data layout passed to the "ConvertToLLVM" pass mandatory iff no DTI layout is present. People should be aware of their target/data layout at that point in time in their pipeline. Then we convert it to DLTI to satisfy the pass precondition.

The LLVM data layout to DLTI conversion could additionally be an upstream pass for any users that'd require the information earlier. I am pretty sure the logic for this already exists in either flang or the LLVM IR importer.

fabianmcg commented 4 months ago

This is not always the case.

From front-ends like Clang and Flang, providing target descriptions and data layouts is not just trivial, but required. I agree with you. But most other things do not have that level of understanding of their targets, or don't want to fix up a target straight away or even will end up with more than one target in the same IR (which will then later be split, etc).

I think it's ok to require Clang/Flang to emit DLTI info and have some passes to require those to exist. It's not ok to make it a requirement to MLIR in general, at least not on its current state.

But that's not an architectural constraint of MLIR, but the status quo of not requiring it and everyone abusing it.

Also, I'm not saying we should always require a target at all abstraction levels, that would be wrong. What I'm saying is that if we are at the LLVM level, then we should know the target. And if there are things that are not expressible, then it means there's a missing abstraction or that something else is not properly setup. For example, for multiple targets we should either have special ops that place concrete semantics on multi-target IR or we should be separating each target specific code into modules.

krzysz00 commented 4 months ago

I'm going to argue the target-specific code into modules thing as being a minor failure of how GPU compilation is handled in the new style - the targets for a gpu.module are always a list, so you can't ask "what's the target" when lowering a kernel. It'd be better to have multiple IR copies, one per target, in cases where that matters

rengolin commented 4 months ago

What might be easier to require is to make the LLVM data layout passed to the "ConvertToLLVM" pass mandatory iff no DTI layout is present. People should be aware of their target/data layout at that point in time in their pipeline. Then we convert it to DLTI to satisfy the pass precondition.

This could create a dual world where DL is passed through different means. If we want to make it a requirement, we need to make it easy to create DL before lowering to LLVM, even if it's just to take in the host's own DL in case everything else fails.

Also, I'm not saying we should always require a target at all abstraction levels, that would be wrong. What I'm saying is that if we are at the LLVM level, then we should know the target. And if there are things that are not expressible, then it means there's a missing abstraction or that something else is not properly setup.

This will depend on how decisions are made. If the high-level stuff makes decision based on data layout, then having one for MLIR and one for LLVM lowering would be wrong. I'd encourage any MLIR pass to use the DLTI one and make it a requirement (for those passes), including LLVM lowering. Same argument as above: we don't want two worlds.

I'm going to argue the target-specific code into modules thing as being a minor failure of how GPU compilation is handled in the new style - the targets for a gpu.module are always a list, so you can't ask "what's the target" when lowering a kernel. It'd be better to have multiple IR copies, one per target, in cases where that matters

The new DLTI design we're building has support for multiple targets, and how you split that is up to implementation. If you want to duplicate IR, it should be fine. You can also split the IR based on which target they apply to (if you can separate at a function level, for example, or after outlining), then you also split the DLTI info to have one per module and one module per target.

zero9178 commented 4 months ago

What might be easier to require is to make the LLVM data layout passed to the "ConvertToLLVM" pass mandatory iff no DTI layout is present. People should be aware of their target/data layout at that point in time in their pipeline. Then we convert it to DLTI to satisfy the pass precondition.

This could create a dual world where DL is passed through different means. If we want to make it a requirement, we need to make it easy to create DL before lowering to LLVM, even if it's just to take in the host's own DL in case everything else fails.

That's what to me would be the pass that converts an LLVM DL (by taking it as a pass option) and coverts it to a MLIR DL on the module. Adding this functionality also as a convenience to the ConvertToLLVM pass isn't strictly necessary, but I think the majority of users are likely only need the DLTI information then so it'd be an added convenience.

To me this wouldn't be a dual world given that the MLIR DL would always be the source of truth and it'd be easier to require it by only barely needing any code changes downstream.

krzysz00 commented 4 months ago

The specific thing I want is that #rocdl.target<...>, set as some form of "the target" in the right place, should implicitly implement DLTI enough to provide the AMDGPU backend data layout

nikic commented 1 month ago

I wanted to check back here and ask what the current state here is. I saw that some DTLI related changes have landed, but being unfamiliar with MLIR, I'm not sure how close that gets us to having DataLayout available in the places that need it.

krzysz00 commented 1 month ago

No substantial progress that I'm aware of

llvm / llvm-project

[mlir] Do not generate sizeof expressions using GEP #96047