Open dinau opened 1 year ago
" abcd".split(peg"\s+") == @["abcd"]
is an improvement upon regex imo. im not sure why the regex wouldn't consume the entirely of the spaces.
encountering multiple occurrences of ur separator leaves the exact course of action up to the user, but
" abcd".split(peg"\s") == @[" ", " abcd"]
is definitely incorrect lol. we can do better. I rewrote a more correct pegs.split function below:
import pegs
iterator mySplit(s: string, sep: Peg): string =
## Splits the string `s` into substrings.
##
## Substrings are separated by the PEG `sep`.
## Examples:
##
## .. code-block:: nim
## for word in split("00232this02939is39an22example111", peg"\d+"):
## writeLine(stdout, word)
##
## Results in:
##
## .. code-block:: nim
## "this"
## "is"
## "an"
## "example"
##
func usefulMatch(s: string): Natural =
# rawMatch normally returns -1 on a failed match, but we would rather
# it return a 0 since we can't have an index of -1.
# Returns the distance to the end of the `sep` peg.
var c: Captures
let matchLen = rawMatch(s, sep, 0, c)
return if matchlen == -1: 0
else: matchLen
func nextMatch(s: string): Natural =
# Returns the distance
result = 0
var c: Captures
while rawMatch(s, sep, result, c) == -1 and result < s.len:
inc result
var holder = s
while holder.len != 0:
let matchLen = usefulMatch holder
holder = holder[matchLen..^1]
let notMatchLen = nextMatch holder
yield holder[0..notMatchLen-1]
holder = holder[notMatchLen..^1]
func mySplit(s: string, sep: Peg): seq[string] =
for nonmatching in s.mySplit(sep):
result.add nonmatching
assert " abcd".mySplit(peg"\s+") == @["abcd"]
assert " abcd".mySplit(peg"\s") == @["", "", "", "abcd"]
The second case produces three empty strings instead of regex's four. Should it get four? I'm not sure. Oh and I'd bet that my function is 10x slower than the original so optimize before throwing it into production :)
add the line
if holder.len == 0: break
above the yield statement if u want to be consistent on the font and back ends of the string.
Description
I've found unexpected behavior of pegs.split() proc comparing with other regex libraries,as follows:
Are these deferences issue or specification ?
Nim Version
Nim Compiler Version 1.6.10 [Windows: i386] Compiled at 2022-11-21 Copyright (c) 2006-2021 by Andreas Rumpf
Current Output
Expected Output
Possible Solution
No response
Additional Information
It seems that from nim-0.19.6 the same behavior has occured.