google / guava

Google core libraries for Java
Apache License 2.0
50.12k stars 10.89k forks source link

It seems wrong that Splitter.fixedLength(n).split("") returns [""], not [] #1178

Closed gissuebot closed 8 years ago

gissuebot commented 9 years ago

Original issue created by kevinb@google.com on 2012-10-25 at 11:38 PM


So all separator-based splitters that don't use omitEmptyStrings have the property that split("") yields an Iterable containing just the empty string [""]. This follows from the fact that (a) if the separator doesn't exist anywhere in the input, splitting should yield only the entire input, and (b) we reject any separator regex that matches the empty string.

But one type of Splitter isn't separator-based: Splitter.fixedLength. Here also, splitting the empty string yields an Iterable containing only the empty string. (Again, unless omitEmptyStrings is requested, which is just a strange thing to have to request of a fixed-length Splitter.)

But in this case, this seems weird. It violates the otherwise-invariant that split().size() == divide(input.length(), fixedLength, CEILING). For Splitter.fixedLength(4), there are 4 different input sizes that can yield an Iterable of size 2, or 3, etc., but there are 5 different input sizes that can yield an Iterable of size 1. This is not how Lists.partition() behaves, so why Splitter?

I tried to rationalize this by thinking of fixedLength as though imaginary zero-width separators were jammed into the input at positions n, 2n, etc. (But not at position 0 since that would violate the rule that the separator must not match the empty string). However, that's no help, as that implies that fixedLength(4).split("abcd") would yield ["abcd", ""].

I could find no record that we discussed this question when reviewing the code initially. A reviewer did remark on how quirky it was that fixedLength().omitEmptyString() could actually be useful, but that didn't raise a red flag.

It's tough to know what could get subtly broken by changing this now, so I'm attempting to convince myself that this little quirk is either (a) good or (b) at least tolerable.

gissuebot commented 9 years ago

Original comment posted by kevinb@google.com on 2012-10-25 at 11:56 PM


If we decide not to change this, we will at least document this quirk clearly.

gissuebot commented 9 years ago

Original comment posted by kevinb@google.com on 2012-10-29 at 07:59 PM


Quirk has been documented.

gissuebot commented 9 years ago

Original comment posted by lowasser@google.com on 2012-10-30 at 07:25 PM


Then this should be marked fixed, no?

gissuebot commented 9 years ago

Original comment posted by kevinb@google.com on 2012-10-30 at 08:13 PM


You're misreading comment 1 as "If and only if." The decision hasn't been made.

gissuebot commented 9 years ago

Original comment posted by kevinb@google.com on 2012-11-01 at 08:03 PM


We have decided it would be correct to change this. Though I'm not sure we have yet thought through what problems we might cause by having libraries out there in the wild that might do this or do that depending on what version of Guava they get run against. I suppose I still feel hesitant.

gissuebot commented 9 years ago

Original comment posted by ray.j.greenwell on 2012-12-19 at 12:47 AM


I have a slogan for this: "Terminators, not separators!".

If one only uses separators then [] and [""] encode to the same thing.

gissuebot commented 9 years ago

Original comment posted by kak@google.com on 2013-08-22 at 11:06 PM


Kevin, want to take a stab at this one?


Status: Accepted Owner: kevinb@google.com

kevinb9n commented 8 years ago

Time to accept it's just going to stay like this.