jcgoble3 / lua-matchext

Fork of Lua 5.3 pattern matching with added features
Other
11 stars 1 forks source link

Pre-compilation of enhanced patterns #10

Open jcgoble3 opened 8 years ago

jcgoble3 commented 8 years ago

For enhanced patterns covered by #9 (#6, #7, and #8) might it be a good idea to pre-compile these patterns to bytecode? It would likely make these features easier to implement. Again, need to study Python's re module code to see how this is done (in particular, how to build an array of unknown length of numbers; maybe also study Lua's source code compiler to see how it builds the bytecode?). Could also build one byte at a time using Lua's buffer facilities, then convert the string to a proper array of numbers.

jcgoble3 commented 7 years ago

Python's re module actually compiles in pure Python, then passes the list of bytecodes to C, which stores them in a struct. So the length of the array is already known by the time it reaches C. (The actual matching is done in C.) Possibly could use a similar approach here, building it in a sequence table (which could be done using either the C API or pure Lua).

jcgoble3 commented 7 years ago

This comment serves as a scratchpad for possible opcodes.

These are always present:

These are present only in basic PUC matching:

These are present only in enhanced matching:

jcgoble3 commented 7 years ago

Scratchpad on backtracking:

typedef struct BacktrackInfo {
    void* srcpos;  /* pointer into source char/codepoint array */
    void* minsrc;  /* how far back in source to go before failing; used by LAZY_REPEAT */
    size_t codepos;  /* index into bytecode array */
} BacktrackInfo;
typedef struct CaptureInfo {
    void* start;
    void* end;
    size_t backtrack;
} CaptureInfo
jcgoble3 commented 5 years ago

If we're going to rewrite the whole engine, here's some links on NFA implementations (non-backtracking): https://stackoverflow.com/questions/1084069/building-a-regex-engine-online-resources

In particular: https://swtch.com/~rsc/regexp/regexp1.html

This I think would be a better way to do it. https://perl.plover.com/Regex/article.html offers tips on backreferences as well.