llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.78k stars 11.9k forks source link

[mlir][EmitC] Model lvalues as a type in EmitC #91475

Closed simon-camp closed 2 months ago

simon-camp commented 5 months ago

This adds an emitc.lvalue type which models assignable lvlaues in the type system. Operations modifying memory are restricted to this type accordingly.

See also the discussion on discourse. The most notable changes are as follows.

mgehre-amd commented 5 months ago

Thank you for this PR, it will put emitc on good foundations!

simon-camp commented 5 months ago

I'm trying to summarize the discussion so far to highlight questions I have right now (much of this was also already written down in the RFC):

Somewhat related, we have a few ops (subscript, get_global) that are specifically handled in the emitter to not immediately create variables. Can we categorize these as exactly those ops returning mlvalues except for the variable op?

[^1] I don't know how to lower the memref-to-emitc if we don't distinguish modifiable and array variables.

mgehre-amd commented 5 months ago

I'm trying to summarize the discussion so far to highlight questions I have right now (much of this was also already written down in the RFC):

Thanks for the summary, that looks good!

For arrays, are you intending to for emitc.variable/emitc.get_global have type !emitc.array for arrays and !emitc.lvalue<type> for other types? I would like that.

simon-camp commented 5 months ago

For arrays, are you intending to for emitc.variable/emitc.get_global have type !emitc.array for arrays and !emitc.lvalue<type> for other types? I would like that.

Yeah that's what I would try to see how it works out. We can catch !emitc.lvalue<!emitc.array<>> in the lvalue type verifier then. And you are right I forgot that global arrays need to be handled similarly to variables.

github-actions[bot] commented 5 months ago

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

simon-camp commented 4 months ago

I'm trying to summarize the discussion so far to highlight questions I have right now (much of this was also already written down in the RFC):

  • add new type that designates modifiable values (currently lvalue possibly renamed to mlvalue to distinguish this from array variables)
  • EmitC operations work on non-lvalues unless explicitly defined otherwise
  • Conversion to non lvalue types is done explicitly through a new op (currently lvalue_to_rvalue) which is directly emitted as an assignment to an anonymous variable.

    • (Maybe we should rename this to lvalue_load or something as the name lvalue conversion is connotated with implicit behaviour for me)
  • ops working with the new type

    • lhs operand of the assign op

    • operand of the apply op if it's an &

    • the result of the subscript op

    • result of the apply op if it's an *

    • result of the variable op

    • unless it's an array, as arrays are not assignable. Additionally there would be no way to work this such a value as the conversion op wouldn't work either [^1]

    • result of the get_global op

  • adding traits/effects to ops

    • lvalue_to_rvalue has a read effect

    • assign has a write effect on lhs

    • variable has alloc effect

    • we should add AutomaticAllocationScopeto all op having regions (func, for, if branches)

Somewhat related, we have a few ops (subscript, get_global) that are specifically handled in the emitter to not immediately create variables. Can we categorize these as exactly those ops returning mlvalues except for the variable op?

[^1] I don't know how to lower the memref-to-emitc if we don't distinguish modifiable and array variables.

Most things should be implemented now. Most notable things missing are the memory effects on the ops and missing checks on one test file.

llvmbot commented 4 months ago

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-emitc

Author: Simon Camphausen (simon-camp)

Changes This is an early unpolished version of what has been previously discussed on [discourse](https://discourse.llvm.org/t/rfc-separate-variables-from-ssa-values-in-emitc/75224/9). See this as a starting point for further discussions. --- Patch is 94.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91475.diff 25 Files Affected: - (modified) mlir/include/mlir/Dialect/EmitC/IR/EmitC.td (+38-9) - (modified) mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td (+18) - (modified) mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitC.cpp (+1-6) - (modified) mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp (+27-4) - (modified) mlir/lib/Dialect/EmitC/IR/EmitC.cpp (+83-24) - (modified) mlir/lib/Dialect/EmitC/Transforms/FormExpressions.cpp (+2-1) - (modified) mlir/lib/Target/Cpp/TranslateToCpp.cpp (+82-43) - (modified) mlir/test/Conversion/MemRefToEmitC/memref-to-emitc.mlir (+12-12) - (modified) mlir/test/Conversion/SCFToEmitC/for.mlir (+27-19) - (modified) mlir/test/Conversion/SCFToEmitC/if.mlir (+12-10) - (modified) mlir/test/Dialect/EmitC/invalid_ops.mlir (+31-30) - (modified) mlir/test/Dialect/EmitC/invalid_types.mlir (+53-13) - (modified) mlir/test/Dialect/EmitC/ops.mlir (+11-10) - (modified) mlir/test/Dialect/EmitC/transforms.mlir (+11-37) - (modified) mlir/test/Dialect/EmitC/types.mlir (+17-2) - (modified) mlir/test/Target/Cpp/common-cpp.mlir (+13-5) - (modified) mlir/test/Target/Cpp/expressions.mlir (+52-32) - (modified) mlir/test/Target/Cpp/for.mlir (+60-22) - (modified) mlir/test/Target/Cpp/global.mlir (+79-17) - (modified) mlir/test/Target/Cpp/if.mlir (+6-6) - (modified) mlir/test/Target/Cpp/invalid.mlir (+1-1) - (modified) mlir/test/Target/Cpp/invalid_declare_variables_at_top.mlir (+13-2) - (added) mlir/test/Target/Cpp/lvalue.mlir (+37) - (modified) mlir/test/Target/Cpp/subscript.mlir (+76-30) - (modified) mlir/test/Target/Cpp/variable.mlir (+9-7) ``````````diff diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td index 5da8593f59563..f5d82bc41642e 100644 --- a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td @@ -97,9 +97,9 @@ def EmitC_ApplyOp : EmitC_Op<"apply", [CExpression]> { }]; let arguments = (ins Arg:$applicableOperator, - EmitCType:$operand + AnyTypeOf<[EmitCType, EmitC_LValueType]>:$operand ); - let results = (outs EmitCType:$result); + let results = (outs AnyTypeOf<[EmitCType, EmitC_LValueType]>:$result); let assemblyFormat = [{ $applicableOperator `(` $operand `)` attr-dict `:` functional-type($operand, results) }]; @@ -835,6 +835,21 @@ def EmitC_LogicalOrOp : EmitC_BinaryOp<"logical_or", [CExpression]> { let assemblyFormat = "operands attr-dict `:` type(operands)"; } +def EmitC_LValueLoadOp : EmitC_Op<"lvalue_load", [ + TypesMatchWith<"result type matches value type of 'operand'", + "operand", "result", + "::llvm::cast($_self).getValue()"> +]> { + let summary = "load an lvalue by assigning it to a local variable"; + let description = [{}]; + + let arguments = (ins + Res]>:$operand); + let results = (outs AnyType:$result); + + let assemblyFormat = "$operand attr-dict `:` type($operand)"; +} + def EmitC_MulOp : EmitC_BinaryOp<"mul", [CExpression]> { let summary = "Multiplication operation"; let description = [{ @@ -1009,7 +1024,8 @@ def EmitC_VariableOp : EmitC_Op<"variable", []> { }]; let arguments = (ins EmitC_OpaqueOrTypedAttr:$value); - let results = (outs EmitCType); + let results = (outs Res, "", + [MemAlloc]>:$memref); let hasVerifier = 1; } @@ -1079,7 +1095,7 @@ def EmitC_GetGlobalOp : EmitC_Op<"get_global", }]; let arguments = (ins FlatSymbolRefAttr:$name); - let results = (outs EmitCType:$result); + let results = (outs AnyTypeOf<[EmitC_ArrayType, EmitC_LValueType]>:$result); let assemblyFormat = "$name `:` type($result) attr-dict"; } @@ -1137,7 +1153,9 @@ def EmitC_AssignOp : EmitC_Op<"assign", []> { ``` }]; - let arguments = (ins EmitCType:$var, EmitCType:$value); + let arguments = (ins + Res]>:$var, + Res]>:$value); let results = (outs); let hasVerifier = 1; @@ -1243,15 +1261,26 @@ def EmitC_SubscriptOp : EmitC_Op<"subscript", []> { EmitC_PointerType]>, "the value to subscript">:$value, Variadic:$indices); - let results = (outs EmitCType:$result); + let results = (outs EmitC_LValueType:$result); let builders = [ OpBuilder<(ins "TypedValue":$array, "ValueRange":$indices), [{ - build($_builder, $_state, array.getType().getElementType(), array, indices); + build( + $_builder, + $_state, + emitc::LValueType::get(array.getType().getElementType()), + array, + indices + ); }]>, OpBuilder<(ins "TypedValue":$pointer, "Value":$index), [{ - build($_builder, $_state, pointer.getType().getPointee(), pointer, - ValueRange{index}); + build( + $_builder, + $_state, + emitc::LValueType::get(pointer.getType().getPointee()), + pointer, + ValueRange{index} + ); }]> ]; diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td index 444395b915e25..fc795962a3e5b 100644 --- a/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td @@ -83,6 +83,23 @@ def EmitC_ArrayType : EmitC_Type<"Array", "array", [ShapedTypeInterface]> { let hasCustomAssemblyFormat = 1; } +def EmitC_LValueType : EmitC_Type<"LValue", "lvalue"> { + let summary = "EmitC lvalue type"; + + let description = [{ + Values of this type can be assigned to and their address can be taken. + }]; + + let parameters = (ins "Type":$value); + let builders = [ + TypeBuilderWithInferredContext<(ins "Type":$value), [{ + return $_get(value.getContext(), value); + }]> + ]; + let assemblyFormat = "`<` qualified($value) `>`"; + let genVerifyDecl = 1; +} + def EmitC_OpaqueType : EmitC_Type<"Opaque", "opaque"> { let summary = "EmitC opaque type"; @@ -128,6 +145,7 @@ def EmitC_PointerType : EmitC_Type<"Pointer", "ptr"> { }]> ]; let assemblyFormat = "`<` qualified($pointee) `>`"; + let genVerifyDecl = 1; } #endif // MLIR_DIALECT_EMITC_IR_EMITCTYPES diff --git a/mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitC.cpp b/mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitC.cpp index e0c421741b305..2e8fbbad14d40 100644 --- a/mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitC.cpp +++ b/mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitC.cpp @@ -137,12 +137,7 @@ struct ConvertLoad final : public OpConversionPattern { auto subscript = rewriter.create( op.getLoc(), arrayValue, operands.getIndices()); - auto noInit = emitc::OpaqueAttr::get(getContext(), ""); - auto var = - rewriter.create(op.getLoc(), resultTy, noInit); - - rewriter.create(op.getLoc(), var, subscript); - rewriter.replaceOp(op, var); + rewriter.replaceOpWithNewOp(op, resultTy, subscript); return success(); } }; diff --git a/mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp b/mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp index 0a89242225255..59a090a5fc65f 100644 --- a/mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp +++ b/mlir/lib/Conversion/SCFToEmitC/SCFToEmitC.cpp @@ -63,9 +63,10 @@ static SmallVector createVariablesForResults(T op, for (OpResult result : op.getResults()) { Type resultType = result.getType(); + Type varType = emitc::LValueType::get(resultType); emitc::OpaqueAttr noInit = emitc::OpaqueAttr::get(context, ""); emitc::VariableOp var = - rewriter.create(loc, resultType, noInit); + rewriter.create(loc, varType, noInit); resultVariables.push_back(var); } @@ -80,6 +81,14 @@ static void assignValues(ValueRange values, SmallVector &variables, rewriter.create(loc, var, value); } +SmallVector loadValues(const SmallVector &variables, + PatternRewriter &rewriter, Location loc) { + return llvm::map_to_vector<>(variables, [&](Value var) { + Type type = cast(var.getType()).getValue(); + return rewriter.create(loc, type, var).getResult(); + }); +} + static void lowerYield(SmallVector &resultVariables, PatternRewriter &rewriter, scf::YieldOp yield) { Location loc = yield.getLoc(); @@ -113,15 +122,26 @@ LogicalResult ForLowering::matchAndRewrite(ForOp forOp, // Erase the auto-generated terminator for the lowered for op. rewriter.eraseOp(loweredBody->getTerminator()); + IRRewriter::InsertPoint ip = rewriter.saveInsertionPoint(); + rewriter.setInsertionPointToEnd(loweredBody); + + SmallVector iterArgsValues = + loadValues(resultVariables, rewriter, loc); + + rewriter.restoreInsertionPoint(ip); + SmallVector replacingValues; replacingValues.push_back(loweredFor.getInductionVar()); - replacingValues.append(resultVariables.begin(), resultVariables.end()); + replacingValues.append(iterArgsValues.begin(), iterArgsValues.end()); rewriter.mergeBlocks(forOp.getBody(), loweredBody, replacingValues); lowerYield(resultVariables, rewriter, cast(loweredBody->getTerminator())); - rewriter.replaceOp(forOp, resultVariables); + // Copy iterArgs into results after the for loop. + SmallVector resultValues = loadValues(resultVariables, rewriter, loc); + + rewriter.replaceOp(forOp, resultValues); return success(); } @@ -173,7 +193,10 @@ LogicalResult IfLowering::matchAndRewrite(IfOp ifOp, lowerRegion(elseRegion, loweredElseRegion); } - rewriter.replaceOp(ifOp, resultVariables); + rewriter.setInsertionPointAfter(ifOp); + SmallVector results = loadValues(resultVariables, rewriter, loc); + + rewriter.replaceOp(ifOp, results); return success(); } diff --git a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp index 20f47574b25ad..43cf410e8889c 100644 --- a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp +++ b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp @@ -61,6 +61,9 @@ void mlir::emitc::buildTerminatedBody(OpBuilder &builder, Location loc) { bool mlir::emitc::isSupportedEmitCType(Type type) { if (llvm::isa(type)) return true; + if (auto lType = llvm::dyn_cast(type)) + // lvalue types are only allowed in a few places. + return false; if (auto ptrType = llvm::dyn_cast(type)) return isSupportedEmitCType(ptrType.getPointee()); if (auto arrayType = llvm::dyn_cast(type)) { @@ -140,6 +143,8 @@ static LogicalResult verifyInitializationAttribute(Operation *op, << "string attributes are not supported, use #emitc.opaque instead"; Type resultType = op->getResult(0).getType(); + if (auto lType = dyn_cast(resultType)) + resultType = lType.getValue(); Type attrType = cast(value).getType(); if (resultType != attrType) @@ -188,9 +193,19 @@ LogicalResult ApplyOp::verify() { if (applicableOperatorStr != "&" && applicableOperatorStr != "*") return emitOpError("applicable operator is illegal"); - Operation *op = getOperand().getDefiningOp(); - if (op && dyn_cast(op)) - return emitOpError("cannot apply to constant"); + Type operandType = getOperand().getType(); + Type resultType = getResult().getType(); + if (applicableOperatorStr == "&") { + if (!llvm::isa(operandType)) + return emitOpError("operand type must be an lvalue when applying `&`"); + if (!llvm::isa(resultType)) + return emitOpError("result type must be a pointer when applying `&`"); + } else { + if (!llvm::isa(operandType)) + return emitOpError("operand type must be a pointer when applying `*`"); + if (!llvm::isa(resultType)) + return emitOpError("result type must be an lvalue when applying `*`"); + } return success(); } @@ -202,20 +217,18 @@ LogicalResult ApplyOp::verify() { /// The assign op requires that the assigned value's type matches the /// assigned-to variable type. LogicalResult emitc::AssignOp::verify() { - Value variable = getVar(); - Operation *variableDef = variable.getDefiningOp(); - if (!variableDef || - !llvm::isa(variableDef)) - return emitOpError() << "requires first operand (" << variable - << ") to be a Variable or subscript"; - - Value value = getValue(); - if (variable.getType() != value.getType()) - return emitOpError() << "requires value's type (" << value.getType() - << ") to match variable's type (" << variable.getType() - << ")"; - if (isa(variable.getType())) - return emitOpError() << "cannot assign to array type"; + TypedValue variable = getVar(); + + if (!variable.getDefiningOp()) + return emitOpError() << "cannot assign to block argument"; + + Type valueType = getValue().getType(); + Type variableType = variable.getType().getValue(); + if (variableType != valueType) + return emitOpError() << "requires value's type (" << valueType + << ") to match variable's type (" << variableType + << ")\n variable: " << variable + << "\n value: " << getValue() << "\n"; return success(); } @@ -842,9 +855,10 @@ LogicalResult emitc::SubscriptOp::verify() { } // Check element type. Type elementType = arrayType.getElementType(); - if (elementType != getType()) { + Type resultType = getType().getValue(); + if (elementType != resultType) { return emitOpError() << "on array operand requires element type (" - << elementType << ") and result type (" << getType() + << elementType << ") and result type (" << resultType << ") to match"; } return success(); @@ -868,9 +882,10 @@ LogicalResult emitc::SubscriptOp::verify() { } // Check pointee type. Type pointeeType = pointerType.getPointee(); - if (pointeeType != getType()) { + Type resultType = getType().getValue(); + if (pointeeType != resultType) { return emitOpError() << "on pointer operand requires pointee type (" - << pointeeType << ") and result type (" << getType() + << pointeeType << ") and result type (" << resultType << ") to match"; } return success(); @@ -964,6 +979,25 @@ emitc::ArrayType::cloneWith(std::optional> shape, return emitc::ArrayType::get(*shape, elementType); } +//===----------------------------------------------------------------------===// +// LValueType +//===----------------------------------------------------------------------===// + +LogicalResult mlir::emitc::LValueType::verify( + llvm::function_ref emitError, + mlir::Type value) { + // Check that the wrapped type is valid. This especially forbids nested lvalue + // types. + if (!isSupportedEmitCType(value)) + return emitError() + << "!emitc.lvalue must wrap supported emitc type, but got " << value; + + if (llvm::isa(value)) + return emitError() << "!emitc.lvalue cannot wrap !emitc.array type"; + + return success(); +} + //===----------------------------------------------------------------------===// // OpaqueType //===----------------------------------------------------------------------===// @@ -981,6 +1015,18 @@ LogicalResult mlir::emitc::OpaqueType::verify( return success(); } +//===----------------------------------------------------------------------===// +// PointerType +//===----------------------------------------------------------------------===// + +LogicalResult mlir::emitc::PointerType::verify( + llvm::function_ref emitError, Type value) { + if (llvm::isa(value)) + return emitError() << "pointers to lvalues are not allowed"; + + return success(); +} + //===----------------------------------------------------------------------===// // GlobalOp //===----------------------------------------------------------------------===// @@ -1078,9 +1124,22 @@ GetGlobalOp::verifySymbolUses(SymbolTableCollection &symbolTable) { << getName() << "' does not reference a valid emitc.global"; Type resultType = getResult().getType(); - if (global.getType() != resultType) - return emitOpError("result type ") - << resultType << " does not match type " << global.getType() + Type globalType = global.getType(); + + // global has array type + if (llvm::isa(globalType)) { + if (globalType != resultType) + return emitOpError("on array type expects result type ") + << resultType << " to match type " << globalType + << " of the global @" << getName(); + return success(); + } + + // global has non-array type + auto lvalueType = dyn_cast(resultType); + if (!lvalueType || lvalueType.getValue() != globalType) + return emitOpError("on non-array type expects result inner type ") + << lvalueType.getValue() << " to match type " << globalType << " of the global @" << getName(); return success(); } diff --git a/mlir/lib/Dialect/EmitC/Transforms/FormExpressions.cpp b/mlir/lib/Dialect/EmitC/Transforms/FormExpressions.cpp index 82bd031430d36..758b8527c2fa5 100644 --- a/mlir/lib/Dialect/EmitC/Transforms/FormExpressions.cpp +++ b/mlir/lib/Dialect/EmitC/Transforms/FormExpressions.cpp @@ -38,7 +38,8 @@ struct FormExpressionsPass auto matchFun = [&](Operation *op) { if (op->hasTrait() && !op->getParentOfType() && - op->getNumResults() == 1) + op->getNumResults() == 1 && + isSupportedEmitCType(op->getResult(0).getType())) createExpression(op, builder); }; rootOp->walk(matchFun); diff --git a/mlir/lib/Target/Cpp/TranslateToCpp.cpp b/mlir/lib/Target/Cpp/TranslateToCpp.cpp index 202df89025f26..79e433cbe4612 100644 --- a/mlir/lib/Target/Cpp/TranslateToCpp.cpp +++ b/mlir/lib/Target/Cpp/TranslateToCpp.cpp @@ -174,6 +174,9 @@ struct CppEmitter { /// Emit an expression as a C expression. LogicalResult emitExpression(ExpressionOp expressionOp); + /// Insert the expression representing the operation into the value cache. + LogicalResult cacheDeferredOpResult(Operation *op); + /// Return the existing or a new name for a Value. StringRef getOrCreateName(Value val); @@ -273,6 +276,18 @@ struct CppEmitter { }; } // namespace +/// Determine whether expression \p op should be emitted in a deferred way. +static bool hasDeferredEmission(Operation *op) { + if (isa_and_nonnull( + op)) + return true; + + if (auto applyOp = dyn_cast_or_null(op)) + return applyOp.getApplicableOperator() == "*"; + + return false; +} + /// Determine whether expression \p expressionOp should be emitted inline, i.e. /// as part of its user. This function recommends inlining of any expressions /// that can be inlined unless it is used by another expression, under the @@ -295,10 +310,10 @@ static bool shouldBeInlined(ExpressionOp expressionOp) { Operation *user = *result.getUsers().begin(); - // Do not inline expressions used by subscript operations, since the - // way the subscript operation translation is implemented requires that - // variables be materialized. - if (isa(user)) + // Do not inline expressions used by operations with deferred emission, since + // the way their translation is implemented requires that variables be + // materialized. + if (hasDeferredEmission(user)) return false; // Do not inline expressions used by ops with the CExpression trait. If this @@ -371,17 +386,11 @@ static LogicalResult printOperation(CppEmitter &emitter, } static LogicalResult printOperation(CppEmitter &emitter, - emitc::GetGlobalOp op) { - // Add name to cache so that `hasValueInScope` works. - emitter.getOrCreateName(op.getResult()); - return success(); -} + emitc::LValueLoadOp lValueLoadOp) { + if (failed(emitter.emitAssignPrefix(*lValueLoadOp))) + return failure(); -static LogicalResult printOperation(CppEmitter &emitter, - emitc::SubscriptOp subscriptOp) { - // Add name to cache so that `hasValueInScope` works. - emitter.getOrCreateName(subscriptOp.getResult()); - return success(); + return emitter.emitOperand(lValueLoadOp.getOperand()); } static LogicalResult printBinaryOperation(CppEmitter &emitter, @@ -621,9 +630,7 @@ static LogicalResult printOperation(CppEmitter &emitter, if (t.getType().isIndex()) { int64_t idx = t.getInt(); Value operand = op.getOperand(idx); - auto literalDef = - dyn_cast_if_present(operand.getDefiningOp()); - if (!literalDef && !emitter.hasValueInScope(operand)) + if (!emitter.hasValueInScope(operand)) return op.emitOpError("operand ") << idx << "'s value not defined in scope"; os << emitter.getOrCreateName(operand); @@ -660,6 +667,10 @@ static LogicalResult printOperation(CppEmitter &emitter, emitc::ApplyOp applyOp) { raw_ostream &os = emitter.ostre... [truncated] ``````````
simon-camp commented 4 months ago

As this is a major breaking change, which possibly impacts many of the users of the EmitC dialect I would suggest to post a PSA on discourse and wait for a while before finally merging this PR. Suggestions welcome.

PSA: Modelling memory of EmitC variables

WIth PR 91475 explicit variables will be modeled as lvalues in the type system. This introduces some breaking changes to multiple operations.

  • emitc.variable and emitc.global ops are restricted to return emitc.array or emitc.lvalue types
    • the result of the emitc.variable op can be materialized as SSA values with the emitc.load op
  • Taking the address of a value is restricted to operands with lvalue type
  • Conversion from lvalues into SSA values is done with the new emitc.load op
  • The var operand of the emitc.assign op is restricted to lvalue type
  • The result of the emitc.subscript and emitc.get_global ops is a lvalue type
    • results can be materialized as SSA values with the emitc.load op
mgehre-amd commented 4 months ago

Thanks @simon-camp for getting the work done, thanks @aniragil and @mgehre-amd for reviewing and for the valuable discussions. This really pushes the dialect forward!

There is one thing we should agree on before this is merged. As the patch introduces breaking changes, @simon-camp and I discussed offline that this requires a PSA. An non-breaking intermediate solution would be to relax the restriction on emitc.variable and to allow EmitCType as a result. This would mean to replace

let results = (outs Res<AnyTypeOf<[EmitC_ArrayType, EmitC_LValueType]>, "",

with

let results = (outs Res<AnyTypeOf<[EmitCType, EmitC_ArrayType, EmitC_LValueType]>, "",

while we ignore the changes regarding the memory effects. Furthermore, the modifications to emitc.apply would need to be rolled-back and instead an emitc.dereference (en.cppreference.com/w/c/language/operator_member_access) / emitc.indirection (en.cppreference.com/w/cpp/language/operator_member_access) and emitc.address_of should be introduced (which we probably want to do in a follow up anyway). With this, the emitc.variable as well as emitc.apply op could still be used as before but users can adapt to the new behavior before the result type of the emitc.variable op gets restricted and the emitc.apply op is removed. However, adopting to the behavior shouldn't be too hard, thus we tend to keep the effort low and just go with a PSA and before merging this in x? weeks. WDYT?

Fine for me.

aniragil commented 4 months ago

There is one thing we should agree on before this is merged. As the patch introduces breaking changes, @simon-camp and I discussed offline that this requires a PSA. An non-breaking intermediate solution would be to relax the restriction on emitc.variable and to allow EmitCType as a result. This would mean to replace

let results = (outs Res<AnyTypeOf<[EmitC_ArrayType, EmitC_LValueType]>, "",

with

let results = (outs Res<AnyTypeOf<[EmitCType, EmitC_ArrayType, EmitC_LValueType]>, "",

while we ignore the changes regarding the memory effects. Furthermore, the modifications to emitc.apply would need to be rolled-back and instead an emitc.dereference (https://en.cppreference.com/w/c/language/operator_member_access) / emitc.indirection (https://en.cppreference.com/w/cpp/language/operator_member_access) and emitc.address_of should be introduced (which we probably want to do in a follow up anyway). With this, the emitc.variable as well as emitc.apply op could still be used as before but users can adapt to the new behavior before the result type of the emitc.variable op gets restricted and the emitc.apply op is removed.

Sounds good (what about the changes to emitc.subscript?) We can take it further by keeping emitc.variable as-is along with emitc.apply and introducing a new emitc.define op along with emitc.address_of and emitc.dereference to work on lvalues. Then, some time after the PSA, erase both emitc.variable and emitc.apply from the dialect, which might be cleaner/safer than altering their semantics. (BTW, we could perhaps extend the existing emitc.subscript op to do dereferencing. It already accepts !emitc.ptr, but still requires an index. If we allow zero indices for pointers it could translate to *p).

However, adopting to the behavior shouldn't be too hard, thus we tend to keep the effort low and just go with a PSA and before merging this in x? weeks. WDYT?

I'm also fine with just giving the community a heads up and pushing the patch as is in a x weeks. As this patch is quite large, x should probably be relatively small to prevent it from bit rotting and to allow further development of emitc (we can perhaps wait a longer x before erasing these ops if we make emitc.variable obsolete, while still pushing it immediately).

simon-camp commented 4 months ago

I'm currently preparing a patch to integrate the changes into iree and I'm having issues updating the code in a few places. We fall back to invoking macros through OpaqueCall ops for things that are not representable in the EmitC dialect at the moment, and parameters might be used as lvalues in the macro expansion.

So one way forward may be to allow lvalues in the operands of the OpaqueCall and update the op description. What do you think?

marbre commented 4 months ago

So one way forward may be to allow lvalues in the operands of the OpaqueCall and update the op description. What do you think?

Allowing lvalues and adjusting the op description is fine for me :+1: (while hoping to provide better options with other new ops that will follow in the future).

simon-camp commented 2 months ago

I've rebased this PR one last time after the introduction of the emitc.switch op. I plan to merge this as soon as the CI has run. Thanks again for all the valuable feedback.

kchibisov commented 2 months ago

Is there a plan to minimize the amount of explicit assignments of loaded value or should I open an issue? In general

func.func @cast_variables() -> i1 {
  %0 = "emitc.variable"(){value = 42 : i8} : () -> !emitc.lvalue<i8>
  %1 = emitc.load %0: !emitc.lvalue<i8>
  %2 = emitc.cast %1: i8 to i1
  return %2 : i1
}

being now

bool cast_variables() {
  int8_t v1 = 42;
  int8_t v2 = v1;
  return (bool) v2;
}

with declare variables at top:

bool cast_variables() {
  int8_t v1;
  int8_t v2;
  v1 = 42;
  v2 = v1;
  return (bool) v2;
}

Look really artificial and redundant compared to what it was before and I'm not sure how downstream(us) could deal with all these since a lot of that is inside the emitter and you can not really do anything about it with verbatim, etc.

Maybe the emitc.load could be allowed inside the emitc.expression, since it doesn't do anything useful on its own and more of a type-system level safety? Then form-expressions pass will likely deal with it correctly.

simon-camp commented 2 months ago

Maybe the emitc.load could be allowed inside the emitc.expression, since it doesn't do anything useful on its own and more of a type-system level safety? Then form-expressions pass will likely deal with it correctly.

The emitc.load op models the side effect of reading the value. If this would be used inside an expression we wouldn't be able to ensure the correct semantics (i.e. ordering of multiple load ops), as the order of evaluation is unspecified in C.

But I'd like to hear opinions from @aniragil or @mgehre-amd.

kchibisov commented 2 months ago

The emitc.load op models the side effect of reading the value.

Maybe the variable should have hasSideEffect field indicating whether it can actually has side effect, e.g. volatile qualified, since I don't think you can have any of the side effects listed on the page you've linked with regular emitc.lvalue<i32> as of now.

If this would be used inside an expression we wouldn't be able to ensure the correct semantics (i.e. ordering of multiple load ops), as the order of evaluation is unspecified in C.

I don't think you'd be able to ensure it either unless you restrict it even further. From the generated code point of view, the code is exactly the same since all you add is intermediate steps. The end expressions are exactly the same, just the variable is named differently because you'd have a temporary binding. So, I'm not quite sure what the current restriction prevents in generated code.

If the point is to ensure that during the mlir passes it won't end up in expression and transformations could be performed on defined order than maybe the load could be inlined like emitc.expression (using inline attribute) during the emitter phase?

mgehre-amd commented 2 months ago

Maybe the emitc.load could be allowed inside the emitc.expression, since it doesn't do anything useful on its own and more of a type-system level safety? Then form-expressions pass will likely deal with it correctly.

The emitc.load op models the side effect of reading the value. If this would be used inside an expression we wouldn't be able to ensure the correct semantics (i.e. ordering of multiple load ops), as the order of evaluation is unspecified in C.

But I'd like to hear opinions from @aniragil or @mgehre-amd.

Wouldn't all possible orders of emitc.load inside an expression still give the same result, as long as there is no emitc.store inside the expression? As our emitc.load is close to a lvalue-to-rvalue conversion, and those are mostly created implicitly in C++ code, it seems fine to me to allow emitc.load in expressions.

aniragil commented 2 months ago

So I think there are two issues here:

First, we currently don't fold expressions with side effects into their using expressions at all, since we might be moving side effects beyond other, potentially conflicting side effects, e.g.:

int f(int a, int b) {
  int8_t v1;      // actual variable
  int8_t v2;      // "SSA-value" variable
  int8_t *v3;     // "SSA-value" variable
  v1 = a;         // emitc.assign
  v3 = &v1;       // emitc.apply "&"
  v2 = v1;        // emic.load
  func(v3);       // may modify v1
  v4 = v2 * 3;    // emitc.expression
  return v4;
}

(Note however that we do fold into expressions with side effects) We're being highly conservative, and can definitely try to relax that for cases we can prove semantics is retained, including for variable loading.

Second, folding more than a single side-effect into an expression brings up the evaluation order issue @simon-camp mentioned. As lvalues are not just variables, a load op used for an array access may lead to a invalid memory access and trigger an error/exception. I'm not sure compilers must respect evaluation order in illegal access cases, but AFAIK for variables (automatic / global) there is no such concern so I believe @mgehre-amd's observation is right for variables, at least for C and C++'s native types (not sure this holds for C++ classes whose copy constructors might do all sorts of stuff internally). In general, we could use the comma operator to enforce evaluation order within expressions. While I don't think we can define variables inside comma expressions, specifically when --declare-variables-at-top is used we could fold dangerous loads in-order using commas, e.g.

int f(int a[300], int b[700], int i, int j) {
  int8_t v1;
  int8_t v2;
  int8_t v2;
  //  v1 = a[i];
  //  v2 = b[j];
  //  return v1 * v2;
  return (v1 = a[i], v2 = b[j], v1 * v2);
}

Not sure it looks better than the alternative though.

kchibisov commented 2 months ago

In general, we could use the comma operator to enforce evaluation order within expressions. While I don't think we can define variables inside comma expressions, specifically when --declare-variables-at-top is used we could fold dangerous loads in-order using commas, e.g.

I don't think that it's worth it given that it's overly complicated and doesn't solve much, you still define the variables in the end of the day where the goal is not to, if you write them on the same line with comas it won't change much.

There're two solutions I could think of, based on what I've seen so far:

Track side effects inside the emitter, since everything is already in place there and the cache is there. The load operation could check whether it has side effect or not and could be inlined if there's no side effect or user asked to inline anyway potentially resulting in undefined evaluation in the generated code.

Or:

Define a trait HasLoadSideEffect, which could be used to unsure that load's operand inside the CExpression is not having this trait.

As a side note, I think mine use case is quite different to what you're going for with lvalue, since we want something more readable being generated in the end and looking like something a regular person could write, doing explicit bindings for subscript is certainly not that (I'd use at or something that does bounds checking). Though, having more defined things generated is a good default from what I can say.

aniragil commented 2 months ago

Track side effects inside the emitter, since everything is already in place there and the cache is there. The load operation could check whether it has side effect or not and could be inlined if there's no side effect or user asked to inline anyway potentially resulting in undefined evaluation in the generated code.

We're actually in the process of making the translator follow MLIR's guidelines, which are to be as simple and as 1-1 as possible, leaving all legalization/optimization work to passes.

Define a trait HasLoadSideEffect, which could be used to unsure that load's operand inside the CExpression is not having this trait.

AFAIK traits shouldn't be dynamic, but that's a technicality as we could use attributes instead. However, that still leaves the translator to do this transformation, which is less desired.

As a side note, I think mine use case is quite different to what you're going for with lvalue, since we want something more readable being generated in the end and looking like something a regular person could write, doing explicit bindings for subscript is certainly not that (I'd use at or something that does bounds checking). Though, having more defined things generated is a good default from what I can say.

Actually, making the emitted C code more readable and closer to how humans write it is very important to me as well (and the reason I added emitc.expression), good to know others interested in raising the bar in that aspect. Now that the lvalues change is in we could look into incorporating lvalues into expressions as well.

Also, could you elaborate on your motivation for using --declare-variables-at-top? Is it part of your readability preference or functionally required, e.g. by your target C compiler?

kchibisov commented 2 months ago

Also, could you elaborate on your motivation for using --declare-variables-at-top? Is it part of your readability preference or functionally required, e.g. by your target C compiler?

I haven't said that we use it though, and I prefer to not use it.

aniragil commented 2 months ago

I haven't said that we use it though, and I prefer to not use it.

Oh, thanks for clarifying that.