musiKk / plyj

A Java parser written in Python using PLY.
Other
150 stars 69 forks source link

Empty statements in import statement are not parsed correctly #33

Closed awm129 closed 9 years ago

awm129 commented 9 years ago

I ran into an issue parsing files with dangling semicolons after an import statement. The java compilers I've tested seem to all accept these as valid syntax but plyj doesn't parse them correctly. I don't understand the codebase well enough to make the fix myself but the issue can be illustrated by modifying the import unit test in compilation_unit.py to something like this:

def test_import(self):
        m = self.parser.parse_string('''
        import foo;;
        import foo.bar;
        ''')
        self.assertEqual(m.import_declarations,
                         [model.ImportDeclaration(model.Name('foo')),
                          model.ImportDeclaration(model.Name('foo.bar'))])

Note the extra semicolon after import foo. A 'real' example of this syntax can be found in AOSP's VideoDumpActivity.java

musiKk commented 9 years ago

What compilers did you test? OpenJDK's javac accepts it, Eclipse on which the grammar used by plyj is based on does not. This looks like a compiler bug in OpenJDK to me. The grammar does not allow for empty statements in the import section. A single semicolon is parsed as an empty type declaration. Since you cannot have imports after a type, this does not parse. For some reason, the Android guys seem to have implemented the wrong behavior as well.

So the 'real' fix for this issue would be to go to every compiler dev that allows this and demand they reject it. Implementing the wrong behavior is far from straight forward; I am not going to do it.

edit: There is a small discussion on the OpenJDK mailing list that went nowhere.

awm129 commented 9 years ago

I've tested this with OpenJDK 6 and 7. I don't really use Eclipse, but I see I do get a red squiggly line if I try to add a dangling semi-colon in there.

I agree that implementing wrong behavior is no good. However, I've also tested with the Oracle compilers for 6, 7, and 8. The syntax has passed each javac I've run it through. I think this lends the syntax a bit more legitimacy. The JLS has no mention of a dangling ';' yet the compiler (reference implementation?) produced by the same company accepts it.

Because all major Java compilers (OpenJDK and Oracle) appear to accept the syntax, I think I'll at least make the change locally. Thanks!

plyj is a great tool btw

musiKk commented 9 years ago

Glad you like it.

I know there is always the issue of correctness vs pragmatism. The behavior is wrong as per the spec and the OpenJDK developers are or at least have been aware of this (cf. the mailing list entry I linked in my previous comment). This is not the first time I encounter something that javac did wrong Eclipse's compiler did right.

I am not 100% opposed to have a lax mode in plyj that allows this sort of common but wrong code. But like I said, this is not trivial for me as it introduces ambiguities that have to be resolved (an intermediate semicolon can suddenly be either an empty import statement or an empty type declaration) and to be frank this goes above my capabilities. If someone can implement this in a non-intrusive way, I'd gladly include it.