p4lang / p4c

P4_16 reference compiler
https://p4.org/
Apache License 2.0
648 stars 433 forks source link

Compiler Bug: Expression E write set already set #4500

Closed kfcripps closed 2 weeks ago

kfcripps commented 5 months ago

Compiling the following P4 code:

enum E {
    e0
}

extern void bar(E x);

extern void baz();

void foo(E y) {
    bar(y);
    if (y == E.e0) baz();
}

control c() {
    apply {
        foo(E.e0);
    }
}

control proto();
package top(proto p);

top(c()) main;

results in:

Compiler Bug: tmp.p4(17): Expression E write set already set
        foo(E.e0);
            ^

This bug has been introduced somewhere in between 812722ffe6de592b1ee4fa6e0782624c557a9919 and 5f9cb550b4844894d976dbe2878721d96a540ec9.

kfcripps commented 5 months ago

Introduced by e60cb8a633e305a75cb28c56dbd3d1dbccc871c8.

kfcripps commented 5 months ago

I'm not sure if this is the best fix, but changing expressionWrites() in def_use.h to the following seems to work for our backend:

    void expressionWrites(const IR::Expression *expression, const LocationSet *loc) {
        CHECK_NULL(expression);
        CHECK_NULL(loc);
        LOG3(expression << dbp(expression) << " writes " << loc);
        auto it = writes.find(expression);
        if (it != writes.end()) {
            // Multiple usages of the same Expression write multiple different locations.
            // Merge the locations.
            // TODO: Re-visit this after the "correct" fix is made in the p4c upstream.
            // See https://github.com/p4lang/p4c/issues/4500.
            loc = it->second->join(loc);
        }
        writes.emplace(expression, loc);
    }
mihaibudiu commented 5 months ago

def_use maintains a hashmap that maps each expression to its def_use information. What happens is that the IR DAG before def_use is invoked contains a common subtree for two different expressions; you can see the representation using: ./p4test -v --top4 FrontEnd_63 issue4500.p4

This is the relevant fragment:

      <MethodCallStatement>(10520)
        <MethodCallExpression>(10521)
          <PathExpression>(10525)
            bar
          <Vector<Type>>(54), size=0
          <Vector<Argument>>(10527), size=1
            <Argument>(10528)
              <Member>(8710)e0
                <TypeNameExpression>(8711)
                  <Type_Name>(89)
                    E */
        bar(/* 
            <Argument>(10528)
              <Member>(8710)e0
                <TypeNameExpression>(8711)
                  <Type_Name>(89)
                    E */
E.e0);
        /* 
      <IfStatement>(10531)
        <Equ>(10532)
          <Member>(8710)e0
            <TypeNameExpression>(8711)
              <Type_Name>(89)
                E
          <Member>(10535)e0
            <TypeNameExpression>(10536)
              <Type_Name>(10540)
                E
        <MethodCallStatement>(10541) */
        if (E.e0 == E.e0) {

You can see that the subtree rooted at 8710 is used in two different expressions. The solution to this kind of problem is to make a deep copy of the respective subtree instead of reusing it. This can be done by either tracking the place where the subtree is reused and using a deep copy of the subtree, or by inserting a pass that converts DAGs into trees before the def-use analysis. I will try to find some time to submit a fix this week or early next week.

kfcripps commented 5 months ago

@mihaibudiu Makes sense, thank you for the explanation.