swlnet / google-collections

Automatically exported from code.google.com/p/google-collections
Apache License 2.0
0 stars 0 forks source link

Add a `Splitter` class #228

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
While the `Joiner` class is great, I'm missing a complementary class to
handle splitting of a string into a collection (i.e. usually a list).

I like the builder-style approach used by `Joiner` and can think of method
chains like this:

  String str = "one, two,  three  ,,five,six,seven";
  List<String> fields =
Splitter.on(",").skipEmpty().trim().maxSplit(4).split(str);
  assert
fields.equals(Lists.newArrayList("one","two","three","five,six,seven"))

Methods/Modifiers explained:

 - `on()` sets the delimiter/separator (as seen on `Joiner`).

 - `skipEmpty()` does not add an empty string to the resulting list. The
definition of "empty" might be related to `trim()`.

 - `trim()` removes leading and trailing whitespace from field values.

 - `maxSplit(n)` limits the number of split operations, so the resulting
list has a maximum number of elements. An alternative might be
`maxFields(n+1)`. However, the behaviour needs to be defined for cases
where `skipEmpty()` is used, i.e. if skipped fields are counted or not.

 - `split()` actually splits the input string (see `Joiner.join()`).

What do you think?

P.S.: I can hardly wait for 1.0 final so we can include google-collections
into our projects, having a stable API.

Original issue reported on code.google.com by j...@nwsnet.de on 3 Sep 2009 at 8:59

GoogleCodeExporter commented 9 years ago
But String class has already a function doing the same thing called 
split(regeexp,
splitLimit). Does not this provide the same functionality what you just 
mentioned ?

Original comment by sinhay...@gmail.com on 3 Sep 2009 at 1:56

GoogleCodeExporter commented 9 years ago
No.

Also, in addition to some of the above modifiers that `String.split()` does not
provide, one can think of even more modifiers like `preserveDelimiters()` (e.g. 
to
"clean" a serialized/stringified list of values by splitting it and putting it 
back
together), `removeFieldEnclosures()` (e.g. removing double quotes from CSV field
values) etc.

Original comment by j...@nwsnet.de on 3 Sep 2009 at 2:48

GoogleCodeExporter commented 9 years ago
Absolutely, String.split() is insufficient, and has some extremely surprising 
behaviors.

Original poster, stay tuned for us to release our Splitter class. I think 
you'll be
very happy with it.

Original comment by kevin...@gmail.com on 3 Sep 2009 at 2:51

GoogleCodeExporter commented 9 years ago
Or even a method `castTo()`, which takes a class and tries to cast each value
extracted from the input string to the given class, e.g.:

  List<Double> values = Splitter.on(",").castTo(Double.class).join("1.23,4.56,7.89");

There are a lot of everyday cases where one has to parse a delimiter-separated 
string
of values into a list, be it data from a configuration file, addresses from 
e-mail
headers, user-given tag names in Web 2.0 applications and many more.

Original comment by j...@nwsnet.de on 3 Sep 2009 at 2:55

GoogleCodeExporter commented 9 years ago
kevinb9n: Thanks, and great, I'm really looking forward to that!

Original comment by j...@nwsnet.de on 3 Sep 2009 at 2:56

GoogleCodeExporter commented 9 years ago
@yo...@nwsnet.de: Yes, I agree. I did not think of all the cases you just 
mentioned.
class like Splitter would be very useful especially in parsing CSV files, where 
we
often have bad data to filter before we can start anything useful. method like
skipEmpty as you mentioned, some method like skipPattern(regexp) would be 
useful to
ignore some special kinds of data.

Original comment by sinhay...@gmail.com on 3 Sep 2009 at 3:03

GoogleCodeExporter commented 9 years ago
@yo...@nwsnet.de:
i would think castTo is a bad idea. it the Splitter emits a List<String> it 
should be
trivial to apply a Lists.transform afterwards with a Function<String,Double> 
closure.

Original comment by heroldsi...@googlemail.com on 9 Sep 2009 at 3:29

GoogleCodeExporter commented 9 years ago
voila:
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/S
plitter.html

See more about the new Guava project: http://guava-libraries.googlecode.com

Original comment by kevin...@gmail.com on 16 Sep 2009 at 7:01

GoogleCodeExporter commented 9 years ago
kevinb9n: Excellent! Also, returning an Iterable is even better that a List.

So Google Collections will be merged into Guava, is that correct? Sounds like a 
good
idea to me, though that probably means I've got to wait a little longer ;)

heroldsilversurfer: I didn't think of that and agree with you, `castTo()` would 
be
redundant then.

Original comment by j...@nwsnet.de on 16 Sep 2009 at 9:01