sebastienros / esprima-dotnet

Esprima .NET (BSD license) is a .NET port of the esprima.org project. It is a standard-compliant ECMAScript parser (also popularly known as JavaScript).
BSD 3-Clause "New" or "Revised" License
430 stars 75 forks source link

Update to NET 8 #423

Closed lahma closed 1 year ago

lahma commented 1 year ago

Like in Jint, it seems that NET 8 has degraded in inlining capabilities and we have to lower the check constant for time being.

sebastienros commented 1 year ago

it seems that NET 8 has degraded in inlining capabilities

I assume it's not easy to find out the scenario. Might be a common thing. Maybe could be detected by throwing random [MethodInlining] and see when that affects the results. That could be in private code blocks though too.

About the regex tests failing, maybe it's missing a font. ;)

sebastienros commented 1 year ago

But about the regex thing, I'd like to know what has changed exactly, I wonder which version is buggy. all .NET version before 8, or 8.

lahma commented 1 year ago

it seems that NET 8 has degraded in inlining capabilities

I assume it's not easy to find out the scenario. Might be a common thing. Maybe could be detected by throwing random [MethodInlining] and see when that affects the results. That could be in private code blocks though too.

I experimented a little and adding AggressiveOptimization allowed a bit larger threshold for tests, but the current defaults expected have somewhat changed. Some paths we could get away with source generators too (passing Func to use generic method for everything, could generate all cases without the need for Func).

But about the regex thing, I'd like to know what has changed exactly, I wonder which version is buggy. all .NET version before 8, or 8.

Tests were successful with NET 6 / FW 4.6.2, so I guess something is stricter / behaves differently with NET 8. Esprima rewrites them quite aggresively so I think @adams85 is our specialist for the matter in hand.

The TimeZone problem found in Jint already was classified as bug so I guess we should have just run previews more aggressively 😉

lahma commented 1 year ago

So now this subject matter expert just rides in with the white horse and makes me look just plain stupid, thanks for that 😉

adams85 commented 1 year ago

But about the regex thing, I'd like to know what has changed exactly, I wonder which version is buggy. all .NET version before 8, or 8.

For rewriting unicode regexps, we rely on the CharUnicodeInfo.GetUnicodeCategory API. Apparently, MS updated the underlying Unicode dataset in .NET 8.

So, it is needed to regenerate the regexp testcases using the included generator.

Like in Jint, it seems that NET 8 has degraded in inlining capabilities and we have to lower the check constant for time being.

About this I have no clue at the moment. But a degradation of 450 -> 380 looks pretty rough...

lahma commented 1 year ago

About this I have no clue at the moment. But a degradation of 450 -> 380 looks pretty rough...

It is, I didn't want to touch it for now so improvements/changes are separate measurable PRs.

lahma commented 1 year ago

A humble thank you @adams85 , probably would have taken me ages to figure this out!

adams85 commented 1 year ago

Sure thing!

BTW, there is a part which was pretty fragile in .NET 6: https://github.com/sebastienros/esprima-dotnet/blob/main/src/Esprima/Token.cs#L132-L141 So, I tweaked it a bit but it seems that not there lies the rub.

Couldn't this inlining degradation be related to the JIT improvements of .NET 8?

sebastienros commented 1 year ago

So, it is needed to regenerate the regexp testcases using the included generator

Two options:

sebastienros commented 1 year ago

Or generate the file as part of the build, and if the file is different than the checked-in one, fail the build.

adams85 commented 1 year ago

A source generator or running the current generator as part of the build process is pretty much out of question because generating the test cases involves running scripts on Node.js. At least, I don't think it would be a great idea to make our build process depend on Node.js being installed.

Of course, we could add some checks to CI, however test cases will change pretty rarely, so it's not worth the hassle IMO. That leaves us with "add a nice comment in the unit tests to remind that if a test fail maybe the generator should be run again". Will do that soon.

sebastienros commented 1 year ago

Out of curiosity, are we using NodeJS because it has the correct RegExp we need, on just because it was easier for you to write it in JS? If the latter could we use Jint? Another sub-comment, maybe NodeJS is already installed on the base image of the GH image, the same way net8.0 is installed by default now.

adams85 commented 1 year ago

Out of curiosity, are we using NodeJS because it has the correct RegExp we need, on just because it was easier for you to write it in JS? If the latter could we use Jint?

It's the former. To be able to verify our output, we need standard-compliant reference data. We could use whatever well-tested JS engine for this purpose, I just happened to choose Node.js.