Open KOLANICH opened 7 years ago
Generally, regular expressions is a pretty non-trivial subject and it is especially so if we're discussing integration of different flavors of regexps available in different languages.
Can you provide any examples where this could be useful, i.e. for parsing purposes?
There is a .mod
format. It is a tracker music format, it has a string of 4 bytes in it. In that string an identifier is stored. Some of the identifuers have some digits meaning the number of channels. Also number of sequences varies depending on an identifier. For example M.K.
means 31 sequence instead of 15. So we need to match against them with a series of regexes. Or alternatively create a huge and ugly expression of slices and conversions
X*Y*Z*U*V*...
variants, it's still much faster and much more practical to just query individual byte array members, rather than trying to match it against (\d)(...)
just to get first byte in a capture group.1 In real-life trackers it is assumed that it is a string. They usually would have writen an error if encountered an unknown or invalid ID, and all known and valid IDs are valid ascii strings.
2 In some real-life trackers the numbers are really parsed.
3 Yes, it's should be faster to match it as an integer, but IMHO since our specs are not only code but also docs written in a formal language, it's a bit ugly to write in such way as it obfuscates the meaning and the intentions. I mean since we think about them as about strings and as we think that the numbers are meaningful, we should put our thoughts into the code.
4 I have 13 regular expressions, 6 of them have \d
templates and another 3 ones have alternating characters.
Hello guys, I was interested in the support of regex expressions to check if fields are valid. I hadn't the same requirement as @KOLANICH expressed, I had not the need to extract groups, just to check the validity of the whole field.
The proposal is to introduce this keyword :
- id: str_test
type: str
size: 2
valid:
regex: "[a-z]\d"
I have already implemented this feature for Cpp, Python, Java, JS, C# and Go, this was not extensively tested. For now I only encounter one major drawback : no native support of regex for Cpp98 which I did not solved yet.
If you are interested in this feature, the commit is available in my fork : https://github.com/jocelynke/kaitai_struct_compiler, if you have feedback this is valuable. I could keep on working on this feature to make it to kaitai master branch.
It is proposed to implement an operator matching against a common subset of regular expressions (ECMAScript ones) supported (stdlib (JS, C++, python, PHP) or a separate well-known lib (PCRE for C) ) by every programming language supported by KS. The operator should accept an expression returning a string and return an array of strings the first argument is a regex the second is flags the third is the string expression to match
The result is an indexable object, which gives access to results (groups) by index.
_is_success
allows to check if the match is succesful._length
(or what do we have for that) allows to get the length of match resultsto_str
(or what do we have for that) gives the whole regex matching result Usage of any groups if there was no success Look&feel is the following: