Should SymbolTables have an equality operator?

sergisiso commented 2 years ago

Currently any ScopingNode equality checks that self.symbol_table == other.symbol_table but since the SymbolTable doesn't have an __eq__ method, it just checks that they are the exact same instance. For example two subroutines

Routine[name=my_sub]
    Return[]
Routine[name=my_sub]
    Return[]

won't be pick up as equal because they often have their own symbol table.

This also happens for loop_body/if_body, ... making the equality limited for subtrees of more than single statements.

LonelyCat124 commented 2 years ago

I think I wasn't sure on this/couldn't find a good use-case for symbol tables being equal, as I thought they were only at the routine level (and it didn't make sense for two Routines to be equal at the moment). I'd missed that all Schedules are scoping nodes, and so this would be an issue.

I guess the question I then have is how does the symbol table of a Schedule end up being populated? For example, for a Routine it felt obvious to me as you usually have:

SUBROUTINE name(...)
    integer :: i
    integer :: j

so conceptually to me, the symbol table of this Routine has two integers, i and j as symbols.

For a loop (in Fortran), there are no variables declared, so I think I'd assumed the symbol table used by its schedule was that of the containing routine (though from checking the code that appears not to be the case). What ends up in this symbol table? I guess the complexity here comes from being compatible with other languages (where those scopingNodes are allowed their own symbol tables realistically).

We could say two SymbolTables are equal if they contain the "same" symbols (more on this in a moment), the same argument_list (I'm not sure types what this contains), the same tags, and the same default_visibility.

The slight difficulty here is that implementing __eq__ for symbol will require implementing __hash__ as well (as they're stored in the SymbolTable in an OrderedDict, as to always produce the same code), so we can either do that, or do the symbol comparison from inside the symbol table. I think the latter is not too bad. The same will be true for the argument_list too. I think tags are strings so we're fine on those.

arporter commented 2 years ago

I would say that two symbols are equal if they:

have the same name;
have the same type (including shape if an array type);
have the same visibility;
are or are not both marked as arguments;

I agree that doing this check inside the symbol table sounds like a good way to go. The argument_list is itself a list of Symbols so we would reuse the test that two symbols are equal for that.

sergisiso commented 1 year ago

In #1899 I needed to compare routines (therefore schedules) and I implemented an initial version of the symbol_table equality operator (which is not complete). To finish it we need the full symbol/datatype/interfaces equality semantics.

Also, we need to consider what references to a symbol mean to be equal, e.g.:

subroutine a()
    real :: b
    b = b + 1
end subroutine
subroutine a()
    integer :: b
    b = b + 1
end subroutine

The equality of comparing both subroutines must return False (because their declaration is different).

But does the equality between the addition statements should be True or False?

LonelyCat124 commented 1 year ago

My vague feeling on this is that the addition statements should be False, but I could be persuaded either way.

I think if the addition statements and symbol tables are equal, then the entire subroutines should be equal (as they are identical) since for code output there is no difference in any programming language. So as I see it the options are either:

Symbols are never equal to any other object
Symbols with the same class, type, name are equal.

Then the subroutines are either equal or not just by recursion.

hiker commented 1 year ago

I am a bit late to the party, and it appears that we are getting off the actual subject (symbol/symbol table equality - now we are talking about statement/subroutine equality) ... but just in case here another border cases:

subroutine a(x)
   integer x
   x=x+1
end subroutine a
subroutine a(y)
   integer y
   y=y+1
end subroutine a

Are they the same? They would (short of debug information) create the same machine code :)

My feeling is that we are better of having functions with an explanatory name (if possible) for comparison (semantically_equal, string_equal, ...) - but it appears that this is a bit late now :)

Even when just comparing symbol tables there is the problem of tags: tags are (afaik) used to create the code for the statements. I.e. once all statements are created (which especially means that there are references to symbols everywhere) - the tags do not matter anymore. Should we then also compare tags? For a stand-alone symbol-table we likely should. But for already created code we might not??

LonelyCat124 commented 1 year ago

We're still pretty on topic I think, statement equality entirely depends on symbol equality at this point (it essentially recurses down to the point where it compares symbols assuming everything else is the same). Subroutines are a bit different as I'm not sure what their implementation is, but will fail if both contain symbols.

I think the difficulty with your example is that there is no way to compare those symbols and have them being equal without breaking the rest of the equality assumptions being made. The name of a symbol in PSyclone is one of its main identities, if we somehow enable those subroutines to be equal, we have to do a mapping of all symbols to all other symbols surely? Since we need:

subroutine a()
   integer x, y
   x = 0
   y = 0
   x = x + 1
   y = y + 1
end subroutine a

the Assignments in this to not be equal for PSyclone's current code to work.

N.B. If I remember correctly the reason most of this arose (any why it was implemented) is not due to semantic outputs of the code, but for better python code (e.g. not using == when the meaning was is, or for comparisons of expressions doing things like in for lists of expressions in clauses for things like OpenMP, where if two references to the same variable are equal you have a way to check without large code bloat - I guess alternatives were possible).

sergisiso commented 4 months ago

In #2517 we tried implementing equality (__eq__) for Symbols and it led to many issues because we usually compare symbols in a scope context (we may have shadowed symbols which are distinct but the __eq__ wouldn't know because its scope its not part of the object itself). This was especially bad when putting symbols in containers (sets, dicts, list) where shadowed symbols would get overwritten. So we had to revert this change.

LonelyCat124 commented 4 months ago

If we attempted this and failed (for good reasons) then maybe a generic equality operator on symbols and symbol_tables (especially since symbols need to be hashable I think?) is not viable? If we rely on symbol scope, then a == b is the same as a is b, and if symbols can only be present in one SymbolTable (which I'm not sure of?) then symbol_table1 is symbol_table2 is the same as == as well (due to transitivity). If people need to compare symbols/symboltables then they'd need their own implementation for whatever it is they're comparing for. Also for the __eq__ methods we discuss in the original post - perhaps we need to modify those methods to do direct comparison of the symbol tables if we don't want those to worry about scoping.

I'd suggest close - wontfix if everyone agrees?

arporter commented 4 months ago

I'd suggest close - wontfix if everyone agrees?

I agree with that. (Note that the same Symbol object can appear in > 1 table.)

stfc / PSyclone

Should SymbolTables have an equality operator? #1698