Closed dacharyc closed 6 months ago
Just ran into this again - another example (I had to remove from examples/cpp/sync/sync-session.cpp to generate the example):
auto connectionState = syncSession->state();
CHECK(connectionState ==
realm::internal::bridge::sync_session::state::paused);
The problem ultimately lies within the application of this regexp in token.ts line 87
const TAG_PATTERN /* */ = /:([A-z0-9-]+):[^\S\r\n]*/;
This is not limited to the word "state". It will happen with ALL tags keywords that use a -start
& -end
component including "snippet" and "block-tag".
IF you have a constant list of tags that use -start then you could generate the a regexp on the fly that utilized that list of strings and matched :
+ letters/numbers/hypens :
UNLESS the characters in that capture were any of the strings that made up the names of tags that also used -start
for example
# this would NOT match because "state" is a known non-line mode keyword (`foo-start` things)
:state:
# this would match because it's NOT a known non-line mode keyword
:remove:
Without a centralized list like that you could hardcode the regexp to exclude those, but then you'd have the maintenance task to remember to update that whenever you added a new foo-start
tag. Someone would inevitably forget so....
alternately you could handle it at the other end of the process and modify
validator.ts line 64 to ONLY blow up if tagNode.tagName
is a tag in the list of tags that support line mode and then do... something to undo the thinking that it's dealing with a tag.
Unfortunately, you've still got the problem of either needing a centralized list that can be checked (that may exist) or needing to manually hardcode all the names that should be ignored.
the things above only address PART of the problem. If we apply any of those solutions we still have the problem that many languages use ::
as a namespace separator AND that some people are putting slack style emoji in comments (e.g. marketing made me do this :facepalm:
)
the regexp at the beginning needs to be modified.
::foo
at the start of a potential tag string but does match :foo
:remove:
isn't interpreted as a tag. I should be able to write a line of code with "foo bar :remove: baz"
in it. that DOESN'T trigger bluehawk. I don't know enough about the potential use cases to know what's "right" here but off the top of my head I'm thinking that the system should be modified to ONLY consider :remove:
(and other non-line mode tags) a tag IF it is in a comment or IF it is at the start of a line or IF it is preceded by nothing but whitespace and followed by a newline.
Testing that it's in a comment complicates things because then you need knowledge of the comment format(s) of all supported languages, and need to know what languages is in play when the line is parsed, and unless that's already in this somewhere it's going to require a fair amount of code to implement and test.
When using Bluehawk with C++, Bluehawk can interpret C++ syntax as the beginning of a Bluehawk markup tag.
Example:
Bluehawk interprets this as the beginning of the
:state:
tag.It would be great if Bluehawk could ignore
::
syntax so it doesn't interpret C++ as the beginning of a markup tag.