[DESIGN] Fine-grained checking in StructInfo

tqchen commented 1 year ago

StructInfo introduces a nice way to be able to do further analysis. It also creates interesting design space about checking. Specifically, consider a function call:

def func(p0: sinfo_p0, p1: sinfo_p1) -> sinfo_ret:
    pass

def call_func():
    a0: sinfo_a0 = ...
    a1: sinfo_a1 = ...
    func(a0, a1)

When looking at sinfo_a0 and sinfo_p0, when to report an error, when should we report a warning and when to accept a program? One interesting part about our current design is that we automatically insert match_cast at function boundary, which means that if some of the errors are not caught by compiler check, we still have runtime checks.

During implementation of the StructInfo, we find that it is helpful to have fine-grained checking and implemented a version of it to support the struct info analysis.

This post marks this as a point for us to think further and discuss further in future iterations.

For a given pair of lhs_struct_info(sinfo_p0), rhs_struct_info(sinfo_a0). Define following terminology:

LSet = {value | value matches lhs_struct_info}
- RSet = {value | value matches rhs_struct_info}

The check result of struct_info_base_check(lhs_struct_info, rhs_struct_info) can have the following results:

kFailL0: The two value sets have no intersection at all: Interset(LSet, RSet) = empty
- Example: lhs_struct_info=R.Shape, rhs_struct_info=R.Tensor
kFailL1: LSet is not superset of RSet by only looking at static information.
- Example: lhs_struct_info=R.Tensor, rhs_struct_info=R.Object
kFailL2: LSet is not superset of RSet because of mismatch in value information.
- Example: lhs_struct_info=R.Tensor[(n, m), "f32"], rhs_struct_info=R.Tensor[(m, k), "f32"]
kPass: LSet is superset of RSet.
- Example: LSet=R.Object, RSet is R.Tensor

If the checking ends up in kFailL0, then obviously an error should be raised. We normally also raise error in kFailL1 in a strong typed system because that is normally when static type check fails.

One thing that worth highlighting and discuss is the derive checking rule for functions struct info. Since this is not a normal way for function sub-typing checking. When looking at

lhs_struct_info=R.Func([R.Object], R.Tensor) and rhs_struct_info=R.Func([R.R.Tensor], R.Tensor).

struct_info_base_check will return kPass. Which means LSet is superset of RSet -- this certain consistent with the definition. However, from traditional sub-typing rule's pov, passing a_func: R.Func([R.Tensor], R.Tensor) to p_func: R.Func([R.R.Object], R.Tensor) will result an type error, because the callee function may call p_func with an object that is not tensor, in which case cannot be handled by a_func.

In the specific case of relax. There are a few interesting observations here:

Because we relies on erasure to eliminate value dependent information, unify information that can be value dependent, it is possible for us to turn a function value with more fine-grained signature to a more coarse grained signature during transformation -- and we might still want to pass that value to the same function(with fine-grained signature) parameter.
We have runtime(match-cast) check automatically inserted at the callee boundary(input and return) which means that we can afford to pass a_func to p_func here and runtime checks will catch the error if it is being mis-used here.

As such the function struct info check for now uses struct_info_base_check and allows such argument passing. So this is a case that is worth thinking considering the design elements of unification, runtime checking and our overall needs.

It would be great to think and discuss further:

What kinds of customization (error/warning per level) we could make for each level
Overall thoughts on function passing and checking rules

slyubomirsky commented 1 year ago

Thank you for posting this issue, as I think checking FuncStructInfo is definitely the toughest part of the StructInfo system. I think my draft spec should be broadly consistent with what you posted, though I think when it comes to shape variables, we should distinguish between the cases of shapes that are definitely different (I would argue this should be an error at compile time), shapes that are definitely the same (clearly not a problem), and shapes that are possibly different (i.e., the compiler can't draw a static conclusion one way or the other. That should probably be a warning).

The rule for checking subtyping of FuncTypes and FuncStructInfo needs to be formulated carefully to avoid the issue you note with the example you gave. In the spec draft, FuncType([T1, T2, ..., Tn], Tr) is a subtype of FuncType([T1', T2', ..., Tn'], Tr') if T1' is a subtype of T1, T2' is a subtype of T2, etc., and Tr is a subtype of Tr' (note the direction of the relationships). We have to treat StructInfo analogously. I'm not sure that our implementation does this.

I agree on the point about failing in the case of kFailL1, since the type system (per our past discussions) would raise an error and request a cast in that case.

tqchen commented 1 year ago

Thanks @slyubomirsky . To dig a bit further on the FuncType and FuncStructInfo. One interesting direction is to explore possibility to not go with the function subtyping rule (that is one reason to bring up the set formalization). This is indeed controversial, so some discussions would be good. The considerations so far are mainly as follows:

A0: We relies on erasure to eliminate value dependent information, unify information that can be value dependent, it is possible for us to turn a function value with more fine-grained signature to a more coarse grained signature during transformation -- and we might still want to pass that value to the same function(with fine-grained signature) parameter.
A1: We have runtime(match-cast) check automatically inserted at the callee boundary(input and return) which means that we can afford to pass a_func to p_func here and runtime checks will catch the error if it is being mis-used here.

The main lines of thoughts here are A0, the existence of erasure makes sticking with subtyping harder. A1(with runtime check) we can afford to relax a bit while still catching error in runtime.

slyubomirsky commented 1 year ago

What would the rule for func types/StructInfo without subtyping look like then? Don't attempt to check any subtyping for those and insist on dynamic checks? Some simplifications might be reasonable, but I think it's important to make sure we can figure out the types/StructInfo of call results (otherwise that would greatly limit the utility of the type system).

tqchen commented 1 year ago

Agree it is important to preserve call result(and we can afford to do so because erasure of call result goes with the same direction of subtyping). The main challenge lies in erasure of function parameter, because func structure info with erased parameter structinfo is not base of the original function before erasure if we follow def of subtyping. Here is one strawman

Call results follow the same rule (as erasure goes in the same direction of subtyping).
Treat L1 level mismatch in function parameters as L2 (and rely on runtime checks, since most cases we don't shoot error in L2 level failure).
Still report L0 level error in function parameters(as they mean obviously no overlap)

slyubomirsky commented 1 year ago

It might be okay to use runtime checks more for checking function types (as long as we implement RTTI), but I think the biggest complexity is in dealing with the rule for calls (matching any shape variables in the function type). Not sure how much we would gain by simplifying the subtyping rule. If we simplify the rule for determining call results, the issues that could arise are 1) that users would have to insert more casts and 2) putting in the wrong cast could result in hiding errors until later.

tqchen commented 1 year ago

Agreed, the goal is to NOT simplify rules for determining call results -- so results are precise. But relax the rules for checking call parameters -- mainly because of the erasure complexity as being mentioned above

slyubomirsky commented 1 year ago

Okay, I don't think there would be too much harm from simplifying the subtyping check for functions because I think the main case where it would come up is higher-order functions, which I don't expect to be very common in Relax. We could do it, though I also don't think that it would be too difficult to implement the subtyping written out in the draft spec.

tqchen commented 1 year ago

Just want to provide some context about the potential design confliction of erasing and subfunction typing. We relies on erasure when we unify the values from different branches, or when we want to simplify certain parts

x0: S0
x1: S1 = erase(x0)

Normally when we erase, we know that S1 is a base of S0 -- we are going towards more coarse grained information. When it comes to functions, if we encourage erasure through out the process to simplify some of our pass writing. Then in this case, R.Func([R.Tensor], R.Tensor) is not strictly base of R.Func([R.Tensor((n, m))], R.Tensor) from sub-typing's pov.

x0: R.Func([R.Tensor((n, m))], R.Tensor)
x1: R.Func([R.Tensor], R.Tensor) = erase(x0)

Relaxing parameter checking would avoid such confliction to occur.

slyubomirsky commented 1 year ago

I see what you are referring to, I think you're right that erasing could cause issues in principle. However, I think that the fact shape information is not used in subtyping (only dtype and ndim) would mean that this case would not cause any issue with the type system. In this case, any call sites of x1 would accept any arguments that were appropriate for x0 (the reverse may not be true, since the dimensions would be checked).

Edit: Have to think about this some more, maybe it isn't as innocuous as it seems.

Edit 2: Okay, yes, this is indeed a case where erasing makes the FuncStructInfo "more specific." It will still be consistent with the typing rules (since those only look at dtype and ndim) but can result in errors at run time, since a member with x0's StructInfo will not work everywhere a member of x1's StructInfo will. I don't see any way we can correct this in erase except by erasing into an ObjectStructInfo (definitely not very satisfying).

slyubomirsky commented 1 year ago

The unification rule I wrote in the spec actually avoids this issue in much the way you described, so I guess we are in agreement: https://github.com/tlc-pack/relax/blob/1da54d35cd83191143a16a14f1c96c5813baa354/relax_spec.md?plain=1#L836-L850

I don't think we should use erasure for unification, for the reasons described. Unifying the parameters succeeds only if they can be proven equal to each other.

tlc-pack / relax

[DESIGN] Fine-grained checking in StructInfo #346