scala / scala-library-next

backwards-binary-compatible Scala standard library additions
Apache License 2.0
69 stars 17 forks source link

Adding splitAsList to String #113

Open BalmungSan opened 2 years ago

BalmungSan commented 2 years ago

I personally always call toList after a split since I don't like using Arrays for multiple reasons.

Additionally, this implementation should be way more efficient than the current approach. And, I personally believe that semantics are more useful.

Happy to receive any kind of feedback about the implementation and, especially, about the tests.

PS: Someone also mentioned that having a splitAsIterator could also be helpful, specially for potentially large Strings Thoughts on that one?


Motivated by a personal desire for such method since a long time ago, plus some recent conversation in the discord channel.

s5bug commented 2 years ago

Re splitAsIterator: I was thinking of java.util.regex.Matcher usage. Could also be splitAsLazyList by wrapping the iterator. Something like

def splitAsIterator(pattern: String): Iterator[String] = {
  val pat = Pattern.compile(pattern)
  val mat = pat.matcher(str)

But I'm blanking on how to use mat.find to actually get it done.

SethTisue commented 2 years ago

I hope we can do something in this area, as I think it's sad that to have a common task like string-splitting require Array, given that we constantly tell language newcomers not to use Array unless they're doing Java interop or super high performance work.

sjrd commented 2 years ago

If you call it split*, make sure it behaves exactly like String.split (i.e., Pattern.split) regarding empty matches, empty strings and empty substrings. The behavior is very touchy. You can look up the implementation (and the tests) in Scala.js: https://github.com/scala-js/scala-js/blob/b0f1dc501edb15203176c636d35323c2bcc24550/javalib/src/main/scala/java/util/regex/Pattern.scala#L168-L220

BalmungSan commented 2 years ago

Hi @sjrd sorry for the delay, I changed the implementation to return a List("") when the input string is empty and the preserveEmptySubStrings is set to true