rouge-ruby / rouge

A pure Ruby code highlighter that is compatible with Pygments
https://rouge.jneen.net/
Other
3.31k stars 732 forks source link

Pascal detection rules are too keen #1866

Closed jamespwilliams closed 1 year ago

jamespwilliams commented 1 year ago

Name of the lexer Pascal

Code sample

class a {
    $test = 'var'
}

I can't provide a useful code sample link, because this is a filetype detection bug, not a highlighting bug.

Additional context

The problem is that the disambiguation rules for Pascal are too keen to decide in favour of Pascal when determining whether a file is Pascal or not. The code above is Puppet, which also uses the .pp filetype. Since https://github.com/rouge-ruby/rouge/pull/1845, Rouge decides that any file containing var is Pascal: https://github.com/tancnle/rouge/blob/master/lib/rouge/guessers/disambiguation.rb#L136. This causes a lot of conflicts, because var is a common FHS component, so pops up in Puppet code very frequently.

Looking for \bvar\s instead of \bvar\b would reduce a lot of false positives, while still being the same semantically - Pascal var keywords have to be followed by a space, from what I can tell. Another approach, which could be used in combination, would be to add a disambiguation rule that declares a file Puppet if it sees a :: - a very common feature of Puppet code, but a rare feature of Pascal (I think - I'm not a Pascal writer).

jamespwilliams commented 1 year ago

http://pirate.shu.edu/~wachsmut/Teaching/CSAS1111/Notes-Pascal/pascal1.html suggests Pascal programs begin with PROGRAM. If that is indeed a rule, that could be a useful disambiguation rule. Someone with more Pascal experience may be able to weigh in here.

tancnle commented 1 year ago

Thanks for reporting the issue @jamespwilliams 👍🏼 I think it is a fair assessment regarding var. PROGRAM might not be applicable to free pascal. I would propose to:

What do you think @jamespwilliams @Alexey-T?

jamespwilliams commented 1 year ago

Sounds good 👍