Open phoe opened 4 years ago
+1 I would like this. Then we add a local package nickname that makes sense to work on vectors (vec
) (or another package?) and we're good.
Honestly, I find the name cl-str
or str
to be somewhat unfortunate in this context, since all of the operations that you list work on vectors as well as they do on strings. (Perhaps with an exception of the char-case operations, since - even though they are technically possible - they may have little practical meaning on vectors that hold characters but are nonetheless not specialized strings, like #(#\H #\e #\l #\l #\o)
.)
If you can afford that, I'd suggest renaming the library from str
to anything that mentions vectors rather than just strings; if not, I'd leave a note that explains the reason why str
is called str
and that that the programmers are free to use local nicknames to nickname the package.
(If you decide to rename the package, I'd also suggest to use a longer package name rather than just str
since that's a very short name and other people might want to locally nickname that with another string library of sorts. But that's a mostly off-topic note)
Not to necrobump, but has there been any progress in this matter?
I've looked into it.
I think that most uses of cl-ppcre
in the source code can be modified to use something else instead, and only some would require coercing non-string vectors into strings and back away from them. The question is, how do we do that? Should we move the real implementation into some sort of cl-vec
library that is guaranteed to work on all vectors, and then have cl-str
become a shim that reexports stuff from cl-vec
?
There is also the issue of symbols with "string" in their names, so, substring
, non-empty-string-p
, non-blank-string-p
, string-case
, and count-substring
. I guess that these should get their own more generic names like subsequence
in cl-vec
, and then cl-str
can export the old names for backwards compatibility.
What would be the best way to proceed here?
Regarding splitting generic vectors. Isn't cl-str specifically a string utility?
Isn't cl-str specifically a string utility?
Yes, right now it is, hence my original proposal from the first post in this thread. Many of the operations defined here can be generalized to work on arbitrary vectors (or even sequences).
if not, I'd leave a note that explains the reason why str is called str
IMO this doesn't need to be explained. Most people probably can figure that the library provides string utilities, hence the name 'str'.
Isn't cl-str specifically a string utility?
Yes, right now it is, hence my original proposal from the first post in this thread. Many of the operations defined here can be generalized to work on arbitrary vectors (or even sequences).
Yeah, ok. But should it? I find it good that there is a library only for strings, with a specific purpose.
Personally, I cannot find a good reason why e.g. (join #(a b c) #(1 2 3))
should signal a type error rather than return #(1 a b c 2 a b c 3)
. If such a function doesn't belong in cl-str
, then perhaps it should belong in a library that operates on all sequence types and that cl-str
can then depend on.
I cannot find a good reason why e.g. (join #(a b c) #(1 2 3)) should signal a type error rather than return #(1 a b c 2 a b c 3).
I would say it raises a type error because the arguments are not strings. From a user perspective (or my perspective as a user) I find it comforting that I don't need to think of other use-cases when using cl-str. It deals with strings. So all inputs and outputs are strings, that's it. Kind of reduces the cognitive load. It also reduces the times when one need to look at the documentation for what types of arguments a function supports, etc.
OK, that works and suggests that a cl-str
fork should be made, into a version that deals with all sequence types.
I wouldn't care if there are more use-cases beyond string. But it would be good if the current public interface could be maintained and maybe an additional one be added that would work more generic?
I think it's possible to maintain the current interface and expose a more generic one elsewhere. I'll try doing that in a spare while.
str
could have a cousin library. With a similar interface, why not, to make users comfortable.
Or, we would quickload "str"
and that would give us two packages, say str
and seq
? The one to use to the discretion of the user.
I am dubious of a generalized library (despite my first comment two years ago). We have many general libraries. This one wants to be straightforward, for strings. I know when I am working with strings (and when I know I am working with sequences, I appreciate that str:substring
works with them too! Very useful.). I fear that extending to vectors would complicate the code too much. That we would loose advantages specific to strings. That remains to be seen.
Probably something can be done with generic-cl.
ps: str:join
has been worked on for performance. That doesn't appear in unit tests, and might be easy to do for both versions, but it should be kept in mind. Ses https://github.com/vindarel/cl-str/issues/67
From a quick glance, it seems that almost all (or even all) of the operations here can apply to vectors of any type, not just
(vector character)
which is what strings are defined to be. Therefore,str
could easily become a vector manipulation library, as opposed to just a string manipulation library.