vimeo / psalm

A static analysis tool for finding errors in PHP applications
https://psalm.dev
MIT License
5.55k stars 662 forks source link

Invalid `column_from` and `column_to` when using multi-byte characters #8653

Open maximal opened 1 year ago

maximal commented 1 year ago

I noticed that columns in Psalm’s code quality report are invalid if the code uses Unicode characters (UTF-8 in my case). They’re counted in bytes, not characters.

This issue breaks code highlighting in IDEs like PhpStorm.

Online Psalm checker on https://psalm.dev/: image

Local Psalm dev-master@4b96c2093499d1fb38e08d5739dd0f3b5fccf531 integrated to PhpStorm: image

ASCII-only variant works fine: image

orklah commented 1 year ago

I think the fact that Psalm is counting bytes was made on purpose. There's very few mb_ functions used in Psalm and it's most often when we must cut at arbitrary length(and we risk cutting through an unicode char)

So this means Psalm is actually generating reports with position as bytes instead of chars. I do not know however what the standard is among tools like this. Maybe psalm.dev and PHPStorm are the one wrongly implementing Psalm's output?

danog commented 1 year ago

most often when we must cut at arbitrary length(and we risk cutting through an unicode char)

About this, I recently profiled psalm and it's the single native function that takes most time in a flat timing graph, I'm half-considering replacing it with a simple PHP solution (which is just a three-branch if to increase the offset by 1, 2 or 3).