Closed bdesham closed 1 year ago
Here's the relevant part of --trace
output:
Trying rule Rule {rMatcher = RegExpr (RE {reString = "\\.?[0-9]", reCaseSensitive = True}), rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = True, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","Number")]}
RegExpr MATCHED Nothing
CONTEXT STACK ["Number","Normal"]
Trying rule Rule {rMatcher = WordDetect "0", rAttribute = DecValTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","IntSuffix")]}
WordDetect MATCHED Just (DecValTok,"0")
CONTEXT STACK ["IntSuffix","Number","Normal"]
Trying rule Rule {rMatcher = DetectChar '\'', rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = RegExpr (RE {reString = "([Uu](LL?|ll?|wb|WB)?|(LL?|ll?|wb|WB)[Uu]?)\\b", reCaseSensitive = True}), rAttribute = BuiltInTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","NumericSuffixError")]}
CONTEXT STACK ["NumericSuffixError","IntSuffix","Number","Normal"]
Trying rule Rule {rMatcher = AnyChar (fromList "'.0123456789"), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
AnyChar MATCHED Just (ErrorTok,".")
Trying rule Rule {rMatcher = AnyChar (fromList "'.0123456789"), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
AnyChar MATCHED Just (ErrorTok,"5")
Trying rule Rule {rMatcher = AnyChar (fromList "'.0123456789"), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = DetectIdentifier, rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
CONTEXT STACK ["IntSuffix","Number","Normal"]
CONTEXT STACK ["Number","Normal"]
CONTEXT STACK ["Normal"]
Trying rule Rule {rMatcher = DetectSpaces, rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = AnyChar (fromList "!&()*+,-:;=>]^|~"), rAttribute = OperatorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
AnyChar MATCHED Just (OperatorTok,";")
double x = 0.5;
looks like that first match is wrong: we don't match the regex never mind, there is a /\.[0-9]/
when we're at 0.5
.?
after the \.
so it does match.
Just going through the syntax description manually, I can't see where the above trace is going wrong....do you?
I’m not really familiar with how to read these XML files, but the trace seems to show that the zero is being matched separately from the rest of the numeric literal, and if I change the zero to another digit I do get the results I expect:
$ cat example2.c
double x = 1.5;
$ skylighting --format=native example2.c
[ [ ( DataTypeTok , "double" )
, ( NormalTok , " x " )
, ( OperatorTok , "=" )
, ( NormalTok , " " )
, ( FloatTok , "1.5" )
, ( OperatorTok , ";" )
]
]
Looking at the traces, both the 0.5 example and the 1.5 example share the line
Trying rule Rule {rMatcher = WordDetect "0", rAttribute = DecValTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","IntSuffix")]}
After that—up to but not including the parsing of the semicolon at the end of the line—the successful example has
Trying rule Rule {rMatcher = RegExpr (RE {reString = "\\.(?:[0-9](?:'?[0-9]++)*+)(?:[eE][+-]?(?:[0-9](?:'?[0-9]++)*+))?|0[xX](?:\\.(?:[0-9A-Fa-f](?:'?[0-9A-Fa-f]++)*+)(?:[pP][-+]?(?:[0-9](?:'?[0-9]++)*+))?|(?:[0-9A-Fa-f](?:'?[0-9A-Fa-f]++)*+)(?:(?:[pP][-+]?(?:[0-9](?:'?[0-9]++)*+))|\\.(?:[0-9A-Fa-f](?:'?[0-9A-Fa-f]++)*+)?(?:[pP][-+]?(?:[0-9](?:'?[0-9]++)*+))?))|(?:[0-9](?:'?[0-9]++)*+)(?:(?:[eE][+-]?(?:[0-9](?:'?[0-9]++)*+))|\\.(?:[0-9](?:'?[0-9]++)*+)?(?:[eE][+-]?(?:[0-9](?:'?[0-9]++)*+))?)", reCaseSensitive = True}), rAttribute = FloatTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","FloatSuffix")]}
RegExpr MATCHED Just (FloatTok,"1.5")
CONTEXT STACK ["FloatSuffix","Number","Normal"]
Trying rule Rule {rMatcher = DetectChar '\'', rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = AnyChar (fromList "FLfl"), rAttribute = BuiltInTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","NumericSuffixError")]}
Trying rule Rule {rMatcher = Detect2Chars 'd' 'f', rAttribute = BuiltInTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","NumericSuffixError")]}
Trying rule Rule {rMatcher = Detect2Chars 'D' 'F', rAttribute = BuiltInTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","NumericSuffixError")]}
Trying rule Rule {rMatcher = Detect2Chars 'd' 'd', rAttribute = BuiltInTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","NumericSuffixError")]}
Trying rule Rule {rMatcher = Detect2Chars 'D' 'D', rAttribute = BuiltInTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","NumericSuffixError")]}
Trying rule Rule {rMatcher = Detect2Chars 'd' 'l', rAttribute = BuiltInTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","NumericSuffixError")]}
Trying rule Rule {rMatcher = Detect2Chars 'D' 'L', rAttribute = BuiltInTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","NumericSuffixError")]}
CONTEXT STACK ["NumericSuffixError","FloatSuffix","Number","Normal"]
Trying rule Rule {rMatcher = AnyChar (fromList "'.0123456789"), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = DetectIdentifier, rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
CONTEXT STACK ["FloatSuffix","Number","Normal"]
CONTEXT STACK ["Number","Normal"]
CONTEXT STACK ["Normal"]
while the buggy example has
WordDetect MATCHED Just (DecValTok,"0")
CONTEXT STACK ["IntSuffix","Number","Normal"]
Trying rule Rule {rMatcher = DetectChar '\'', rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = RegExpr (RE {reString = "([Uu](LL?|ll?|wb|WB)?|(LL?|ll?|wb|WB)[Uu]?)\\b", reCaseSensitive = True}), rAttribute = BuiltInTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("C","NumericSuffixError")]}
CONTEXT STACK ["NumericSuffixError","IntSuffix","Number","Normal"]
Trying rule Rule {rMatcher = AnyChar (fromList "'.0123456789"), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
AnyChar MATCHED Just (ErrorTok,".")
Trying rule Rule {rMatcher = AnyChar (fromList "'.0123456789"), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
AnyChar MATCHED Just (ErrorTok,"5")
Trying rule Rule {rMatcher = AnyChar (fromList "'.0123456789"), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = DetectIdentifier, rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
CONTEXT STACK ["IntSuffix","Number","Normal"]
CONTEXT STACK ["Number","Normal"]
CONTEXT STACK ["Normal"]
It really looks like the buggy case treats “0” as an integer literal.
So the question is why WordDetect "0"
is there! Presumably it shouldn't be matching 0
in 0.5
; I'll have to look at the implementation of WordDetect
..
I understand now. Full rule is
<WordDetect attribute="Decimal" context="IntSuffix" String="0" weakDeliminator="."/>
Here weakDeliminator
adds characters that are allowed to belong to the word and hence not treated as word separators. We're apparently ignoring that and treating .
as a word separator.
Awesome, thank you very much 😊
Consider this C file:
With skylighting 0.13.4, the
.5
part is highlighted as an error:If I use the C++ syntax instead, I get the expected behavior where the entire
0.5
is treated as a single number:Please let me know if there’s any further information I can provide that would be helpful!