rpau / javalang-compiler

Java compiler elements (symbol and type tables) to perform code semantic analysis
GNU Lesser General Public License v3.0
10 stars 4 forks source link

Dealing with missing types and symbols #74

Open cal101 opened 7 years ago

cal101 commented 7 years ago

Question

Should walkmod have some kind of propagatable incomplete or error symbol and type?

The problem

Walkmod does have a "try your best and continue on error" philosophy. When it comes to missing symbols that currently means "continue with next compilation unit". That could be improved.

I am aware of 2 cases where missing symbols happen in operational code:

  1. When code is disabled via "if(false)" either no or incomplete (?) class files are generated for anonymous classes.
  2. Walkmod typically resolves more constructors, fields, methods and types of thirdparty libraries than operational code needs at runtime. That results in "NoClassDefFoundError" on analysis.

The simple variants of case 1 were fixed by pull requests that ignore missing anonymous classes in disabled code and use specially marked SymbolType. When anonymous classes have fields and extra methods or other members i discovered recently that the marked symbols have to be considered to ignore more otherwise fatal cases. This is related to case 2 above where symbols of thirdparty types can't be resolved if some class is not contained in the runtime jars of the project because it't not needed or it's optional.

I would like to support a form of continue-on-missing-class strategy here to. The idea is to ignore all or a list of missing classes by configuration and introduce an "error" symboltype. The error symboltype is treated as a special case that can propagate if used for resolving fields or methods. This introduces a new category of visitors: visitors need to signal awareness of incomplete symbols to avoid accessing class, method or field data. So far this allowed me some more simple refactorings than before. But otherwise this causes rippling changes. Code and configuration has to request the new behavior so nothing should change for existing code. What do you think? Is that some useful direction?

rpau commented 7 years ago
        Hi Cal,Just a short comment. In databases, Null means unaplicable or unknown. I usually prefer to use null than specific classes (functional programming style). The less types ( concepts ) you need to express - for the same coupling and cohesion, the easier the code is to understand.However, I think that you are considering a new class because of the symbol table. Currently, the symbol table returns null if the symbol does not exist.Could not the null be checked on those conditions?---- On lun, 26 jun 2017 19:42:04 +0200  cal<notifications@github.com> wrote ----Question Should walkmod have some kind of propagatable incomplete or error symbol and type? The problem Walkmod does have a "try your best and continue on error" philosophy. When it comes to missing symbols that currently means "continue with next compilation unit". That could be improved. I am aware of 2 cases where missing symbols happen in operational code:  When code is disabled via "if(false)" either no or incomplete (?) class files are generated for anonymous classes. Walkmod typically resolves more constructors, fields, methods and types of thirdparty libraries than operational code needs at runtime. That  results in "NoClassDefFoundError" on analysis.  The simple variants of case 1 were fixed by pull requests that ignore missing anonymous classes in disabled code and use specially marked SymbolType. When anonymous classes have fields and extra methods or other members i discovered recently that the marked symbols have to be considered to ignore more otherwise fatal cases. This is related to case 2 above where symbols of thirdparty types can't be resolved if some class is not contained in the runtime jars of the project because it't not  needed or it's optional. I would like to support a form of continue-on-missing-class strategy here to. The idea is to ignore all or a list of missing classes by configuration and introduce an "error" symboltype. The error symboltype is treated as a special case that can propagate if used for resolving fields or methods. This introduces a new category of visitors: visitors need to signal awareness of incomplete symbols to avoid accessing class, method or field data. So far this allowed me some more simple refactorings than before. But otherwise this causes rippling changes. Code and configuration has to request the new behavior so nothing should change for existing code. What do you think? Is that some useful direction?  —You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or mute the thread.             
cal101 commented 7 years ago

I do not think that null is a good choice here. Too often we had cases where a symbol was missing because of some bug. That should be clearly separated from cases where some class not found when lookjng up a field or constructor because a dependant is missing. The symbol resolution in nested scopes will be different for those cases. For a lot of binary cases i don't mind if it's nullable or functional style. But this is ternary style or exceptional case style. The unknown character of sql null does fit here as does the NaN of IEEE floating point. It widens the domain of functions and allows functions to be combined and increases the chance to get useful results if the "unknown parts" of some expression are not needed for obtaining some result. I fixed a few bugs lately where a usage info of "null" in walkmod symbols was interpreted as "no usage" where it really meant "not initialized" for some reason. IMHO there is an important difference between having no symbol in some scope and just having a symbol but not being able to lookup it's constructor. I went down the "null" road for a few hours but triggered too many NPE and "Oops" cases where i could not separate the internal errors from the "incomplete type" cases. The null value just does not carry enough information here. And I don't think writing more code with "!=" null style helps in the sematic analysis code. It too often just obscures the real causes of the problem. Some late "oops, missing symbol" may have been detected earlier at the places where null symboldata is written to the AST. But I'm still not familiar enough with the flow to be sure enough to throw in those cases. Another overload on the meaning of null hurts. If we consider it useful to detect a case we need to store enough of the information if we want to make use of it.

cal101 commented 7 years ago

To make it clear: I don't opt for replacing all exceptions with error symbols. I added at few places where a field lookup or a method or constructor lookup or some class lookup lead to a noclassdefounderror typically for some other class then the target class. The error symbol made it possible to carry on, pass some oops checks and let my refactor visitor do some useful work because the missing symbols were not needed for it's operation.

rpau commented 7 years ago
        I agree that null currently means "not resolved for some reason", because Walkmod tries to resolve as much as it can. But also because if during the transformation, new nodes are created, they do not have any resolved type.So, I agree with you of having an special case for those unaccessible anonymous classes because the compiler does not generate the bytecode. However, it id important to have a good name for this symbol types to ensure that it does not cover errored situations. I am thinking about an AnonymousSymbolType with a property that says if it is reachable or not. Thoughts?---- On lun, 26 jun 2017 22:46:02 +0200  cal<notifications@github.com> wrote ----I do not think that null is a good choice here. Too often we had cases where a symbol was missing because of some bug. That should be clearly separated from cases where some class not found when lookjng up a field or constructor because a dependant is missing. The symbol resolution in nested scopes will be different for those cases. For a lot of binary cases i don't mind if it's nullable or functional style. But this is ternary style or exceptional case style. The unknown character of sql null does fit here as does the NaN of IEEE floating point. It widens the domain of functions and allows functions to be combined and increases the chance to get useful results if the "unknown parts" of some expression are not needed for obtaining some result. I fixed a few bugs lately where a usage info of "null" in walkmod symbols was interpreted as "no usage" where it really meant "not initialized" for some reason. IMHO there is an important difference between having no symbol in some scope and just having a symbol but not being able to lookup it's constructor. I went down the "null" road for a few hours but triggered too many NPE and "Oops" cases where i could not separate the internal errors from the "incomplete type" cases. The null value just does not carry enough information here. And I don't think writing more code with "!=" null style helps in the sematic analysis code. It too often just obscures the real causes of the problem. Some late "oops, missing symbol" may have been detected earlier at the places where null symboldata is written to the AST. But I'm still not familiar enough with the flow to be sure enough to throw in those cases. Another overload on the meaning of null hurts. If we consider it useful to detect a case we need to store enough of the information if we want to make use of it.  —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.             
cal101 commented 7 years ago

The unreachable anonymous symbol types are already marked. See Marker enum in Symboltype. Improvements for naming are always welcome. That is "case 1" in the original post. But that unreachable symbol type will be used by walkmod e.g. when looking up a field of that anoymous class. I have that case but wasn't able to make a test case yet. I just realized recently that the strange class verify error i get may be related to a field definition of a disabled anonymous class. I want to give that field a symboltype, too and find null to be lacking information. I want to keep the "oops" checking for bug detection. This is a different error condition than disabled anonymous type even if induced by that. I agree that proper naming is very important here. It's at the heart of the analysis. You have to enable the behavior explicitly and visitors have to tell awareness. For other visitors that need full semantic analysis it should be fatal as before. "IncompleteType", "UnresolvedType", "UnresolvedDependantType" point in the right direction but are not good enough.

rpau commented 7 years ago
        Ok, get it. What aboutAnonymousMemberTypeI do not like to add "negative" meaning to a symbol type, because it looks like an exception class. ---- On mar, 27 jun 2017 08:24:28 +0200  cal<notifications@github.com> wrote ----The unreachable anonymous symbol types are already marked. See Marker enum in Symboltype. Improvements for naming are always welcome. That is "case 1" in the original post. But that unreachable symbol type will be used by walkmod e.g. when looking up a field of that anoymous class. I have that case but wasn't able to make a test case yet. I just realized recently that the strange class verify error i get may be related to a field definition of a disabled anonymous class. I want to give that field a symboltype and find null to be lacking information. I want to keep the "oops" checking for bug detection. I agree that proper naming is very important here. It's at the heart of the analysis. You have to enable the behavior explicitly and visitors have to tell awareness. For other visitors that need full semantic analysis it should be fatal as before. "IncompleteType", "UnresolvedType", "UnresolvedDependantType" point in the right direction but are not good enough.  —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.             
cal101 commented 7 years ago

"AnonymousMemberType" does not fit. Too broad, the problem is not that it's an anonymous member type, but a member of a "disabled anonymous class". Probably only a "declared member of disabled anonymous class". And binding it to anonymous classes related "case 1" only would not allow reuse for other cases like "case 2".

ReplacementType SubstitutionType SubstituteType StopgapType

My favorite: SurrogateType

Stands out and tells "its not the real thing"

rpau commented 7 years ago
        +1 SurrogateSymbolType (please, use the parent class as suffix)---- On mié, 28 jun 2017 21:29:54 +0200  cal<notifications@github.com> wrote ----"AnonymousMemberType" does not fit. Too broad, the problem is not that it's an anonymous member type, but a member of a "disabled anonymous class". Probably only a "declared member of disabled anonymous class". And binding it to anonymous classes related "case 1" only would not allow reuse for other cases like "case 2". ReplacementType SubstitutionType SubstituteType StopgapType My favorite: SurrogateType Stands out and tells "its not the real thing"  —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.