Closed mdbergmann closed 11 months ago
Hello, indeed, and that is a feature. str:split
explicitly quotes meta characters to not allow regexps. It should be explicit with the documentation and the docstring.
And indeed, we can use ppcre:split
for that (and we always can because ppcre is a dependency). At first sight, I find that adding re-split
would not have added value and is not worth duplicating a function. Enhancing the README and the docstring to refer to ppcre would have been enough in your case?
explicitly quotes meta characters to not allow regexps
Yeah. I've seen that. Had a glimpse at the sources.
Enhancing the README and the docstring to refer to ppcre would have been enough in your case?
Well. I guess it has to if you don't want to add it. I find it unfortunate however to fall back to ppcre directly to perform a split of a string which enforces me to mix namespaces of 'str' and 'ppcre' when only 'str' would suffice.
From an API perspective this could be controlled via key parameters, including the rsplit
to just use split
, for instance:
(split "o" "foo" :reverse) ;; instead of `rsplit`
(split "o{2}" "foor" :regex)
Manfred
if you don't want to add it.
I don't close the possibility.
I find it unfortunate however to fall back to ppcre directly to perform a split of a string which enforces me to mix namespaces of 'str' and 'ppcre' when only 'str' would suffice.
yeah I understand this too. But:
re-split
, then what if we want to, say, extract substrings matching a regexp? Or call starts-with-p
but with a regexp? etc Are they valid use cases for this string library or light copies of pccre functionalities?this could be controlled via key parameters
yes +1, we do this for some functions but it could be generalized.
when we think "regexp", it might be best to turn to ppcre.
Regex is just a representation of an arbitrary string. The most flexible way to represent a string.
Regex is not necessarily bound to ppcre. It just happens to be that ppcre is the library that 'understands' them.
However, ppcre
is much more low-level than str
is.
I don't care so much whether it is a regex to use for splitting as long as I can use an arbitrary string.
(Insofar I would probably refrain from re-split
, but just have a split
).
I.e. splitting a text file with Windows line endings I have to use this work around.
(str:split (str:join "" '(#\return #\newline))
(str:join "" '("foo" #\return #\newline "bar")))
I see string splitting essential for string parsing, it is kind of a light weight alternative to capturing (which really is about regexes) but in order to be useable for parsing it must allow arbitrary strings for splitting.
Or call starts-with-p but with a regexp?
That's a valid point. What about other functionalities like 'starts-with', or 'ends-with'. My take is that those are much less dependent on regular expressions than splitting is. Though it might still be necessary to supply a tab character to a 'starts-with' function. I'm not sure if there is any other way of encoding special characters in a string so that it can be applied in 'starts-with', 'split', without using a regex.
Thanks for detailing your use case and motivation.
(Insofar I would probably refrain from re-split, but just have a split).
split
with a :regex
(:re
? both?) key would be good for you? That looks good, we should do it.
I.e. splitting a text file with Windows line endings I have to use this work around.
Here probably str
should help and provide specific variables or function parameters. So you would not look for a regexp, but use a built-in explicitely.
Though it might still be necessary to supply a tab character to a 'starts-with' function.
+1, we should be able to give a character to starts-with-p
, as with other functions.
Hi.
split with a :regex (:re? both?) key would be good for you?
I would choose :regex
So then str:split
simply needs the :regex
keyword parameter and an if
clause like this?
(if regex
(ppcre:split separator s :limit limit :start start :end end)
(ppcre:split `(:sequence ,(string separator))
s
:limit limit :start start :end end)))
Or do we need more adjustments, such as support for the other ppcre:split
parameters (with-registers-p
, omit-unmatched-p
, sharedp
), or something else ? @mdbergmann @vindarel
split
, rsplit
and split-omit-nulls
with a :regex
key argument is probably useful, although I didn't encounter the need.
An example I can think of:
(str:split "[0-9]+" "some987stupid123string" :regex t) ;; '(some stupid string)
I have the same ideas about improving the str:split
today. Instead of using the regex
, I think separator can be a list that contains all the separators. Like (str:split '(";" "," " ") "some;thing, stupid ")
But looks like the regex is the more general way to improve. I am happy with importing the :regex
keyword.
Update: Gave a PR for split regex. https://github.com/vindarel/cl-str/pull/110
thanks for doing it!
May it serve you well for advent of code ;)
Ah! @vindarel it is you made this tool. I thought the id is familiar!
See:
Since under the hoods also ppcre is used it would be great to support splitting by regex. Maybe a separate function
re-split
?