jamadden / mrab-regex-hg

Automatically exported from code.google.com/p/mrab-regex-hg
0 stars 2 forks source link

expose the parsed structure of the regex pattern (for highlighting) #74

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,
thanks very much for the quick fix of the issue 73. Now I'd have another 
"question-like" issue.
I am trying to implement some kind of highlighting of regex patterns for better 
visibility of different elements of the patterns (like literals, character 
sets, quantifiers, anchors, lookarounds, groups ...).
Is there possibly some way to map certain substrings of the pattern string to 
these categories (or ideally even instances - like different groups) - based on 
the structure build on parsing in the regex engine?

I am currently trying some basic approach, i.e. applying different 
"meta-patterns" :-) for the corresponding regex elements and styling the 
matches.

While this works to some extent, it is obviously tricky or (near?) impossible 
in some aspects - e.g. to get the corresponding balanced parens and style its 
content differently, if needed.

Could the parsed structure of the pattern be accessed in some way, or are there 
other options to achieve this?

(If this were possible, the next step could be to highlight the parts of the 
pattern together with the corresponding (sub)matches in the text ...)

Thanks in advance
   vbr

Original issue reported on code.google.com by Vlastimil.Brom@gmail.com on 9 Jul 2012 at 12:34

GoogleCodeExporter commented 9 years ago
The parsed structure (in the form of a tree) doesn't contain the information 
you would need. It doesn't, for example, contain the positions within the 
pattern string of the various parts because it has no need for them, and, 
anyway, the module then optimises, compiles, and discards it.

Do you know of any other regex implementations which possess such a feature?

Original comment by re...@mrabarnett.plus.com on 9 Jul 2012 at 12:58

GoogleCodeExporter commented 9 years ago
Thanks, I don't know of any regex implementation exposing this information 
explicitly. I was just considering possible approaches to achieve this 
functionality (as seen in some regex testers/debuggers).
It appeared to me, that regex has a well structured parsing process (in 
contrast to the more opaque re) (e.g. judging from the function calls in 
tracebacks...), and I thought, such information, or some relevant bits of it, 
would be somehow available to the parser at some point.
The ouptput using the DEBUG flag indeed contains useful information, but I 
wasn't able to connect it to the pattern parts; although it appears to be 
(partly?) segmentable - based on GROUP ... CHARACTER MATCH and possibly others.
However, it is not the case, it certainly shouldn't be added, as this is not a 
library business, but that of the application. Do you think, are there some 
other approaches possible? Can the parse tree be accessed in regex in some 
(semi)official way?
Thanks and regards
    vbr

Original comment by Vlastimil.Brom@gmail.com on 9 Jul 2012 at 9:28

GoogleCodeExporter commented 9 years ago
The parse tree is purely an implementation detail.

Original comment by re...@mrabarnett.plus.com on 9 Jul 2012 at 7:10

GoogleCodeExporter commented 9 years ago
There are a number of features which are not represented in the parse tree, 
such as the in-line flags and non-capture grouping.

Original comment by re...@mrabarnett.plus.com on 10 Jul 2012 at 4:50