musiKk / plyj

A Java parser written in Python using PLY.
Other
150 stars 69 forks source link

Java 8 support #40

Open musiKk opened 8 years ago

musiKk commented 8 years ago

plyj only supports Java 7 right now. Add support for all Java 8 features.

The branch java8_2 already contains default methods and an unsuccessful stab at reference expressions.

codecop commented 8 years ago

Thanks for plyj, great work! Can you provide the links to the JDT grammar and the things you (as you wrote) directly ported from?

musiKk commented 8 years ago

The grammar file is here. A bit of code archaeology is required: If you correlate the time of the first commit in this repo with the log of the linked java.g you get the version that is the base of all the code in this repo. Diff that version with the one from their release with Java 8 support you get a pretty manageable diff.

I'm not exactly sure what JDT is doing. I think the culprit is within the scanner class. There is a method disambiguatedToken which... disambiguates tokens. This is done with the help of a VanguardParser which to the best of my knowledge is another parser with its own state and therefore acts like a "dynamic lookahead". The grammar is LALR(1) after all. I suspect they did it to either get better performance or keep the grammar simpler. It smells like a hack to me but what do I know...

Since I (am pretty sure one) cannot do that with PLY this is my roadblock.

codecop commented 8 years ago

Sight. It uses even its own Jikes parser generator (http://www.eclipse.org/jdt/core/howto/generate%20parser/generateParser.html). I have some experience with JavaCC, worked on it for JavaNCSS Java 7 upgrade. But that does not help here.

The other option is to "hack" the parser to digest some Java 8 features in certain situations. It does not have to be complete and pure Java 8 and follow JDT - which is like the canonical parser of course. But maybe you do not want that. You write that some features of the JDT grammar could still be salvaged though. So lambdas would be the hard part?

musiKk commented 8 years ago

I always preferred to stay as close to JDT's grammar as possible in order to be able to incorporate new features easier but clearly this is out the window now.

Currently the problem is reference expressions. I don't remember whether I tried lambdas yet but looking at the aforementioned disambiguatedTokens method I guess there will be similar problems.