maidh91 / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

An Interner<String> which delegates to String.intern() #399

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Could be pretty handy when combined with my proposed Functions.forInterner().

Original issue reported on code.google.com by ray.j.gr...@gmail.com on 11 Aug 2010 at 11:45

GoogleCodeExporter commented 9 years ago
I'm guessing you mean an interner that would use permgen space via 
String.intern() (otherwise it's no different from Interners.newWeakInterner()). 
 I had never considered this idea since we wrote this code in response to the 
pitfalls of String.intern().

I have heard not very positive things about how well String.intern() is 
implemented and I suspect it's even possible that you're better off with one of 
ours (especially once we rewrite it to use MapMaker).

Original comment by kevinb@google.com on 8 Sep 2010 at 6:34

GoogleCodeExporter commented 9 years ago
I haven't heard about problems with String.intern(), although my ear is not 
especially close to the ground.

I would assume that it wouldn't be any worse than a weak map. It's native, and 
could always be improved if someone working on Java got a wild hair.

On my project, we intern certain Strings read out of the database or 
configuration files, as they are likely to be from a small set of repeated 
values. I suppose we could just use an Interner for this, as I am already doing 
for various immutable collections (reading many records from the database 
containing sets of enum attributes. It's worthwhile for me to share references 
where possible.)

We can certainly write our own Interner that calls String.intern(). I just 
thought it might be worthwhile to add it to Interners, as obviously if one has 
code mixing an Interners.newWeakInterner() with String.intern() it would not be 
ideal as there would then be two pools of interned Strings.

Original comment by ray.j.gr...@gmail.com on 10 Sep 2010 at 10:56

GoogleCodeExporter commented 9 years ago
Whether to do anything here depends on getting some good benchmarks of Interner 
vs. String.intern() performance.  And we're going to reimplement Interner a 
bit, so results from after that will be most relevant.  Holding open for now, 
but not much to do just yet.

Original comment by kevinb@google.com on 19 Mar 2011 at 3:43

GoogleCodeExporter commented 9 years ago
Here's something I hadn't thought of before:

char[] bigchars = new char[1000000];
Arrays.fill(bigchars, 'z');
String big = new String(bigchars)
String small = big.substring(5, 5);

'small' is now an empty String, but with a strong reference to the 'bigchars' 
array. If I was interning my strings, I would not want it to become the 
canonical empty String.

String.intern() appears to always create a new String if it didn't previously 
have a mapping (contradicting the javadoc). If I intern small I get a different 
reference back, but the same applies to big. :/

Original comment by ray.j.gr...@gmail.com on 7 Apr 2011 at 11:49

GoogleCodeExporter commented 9 years ago
Edit: 'small' has reference to a copy of the 'bigchars' array...

Original comment by ray.j.gr...@gmail.com on 8 Apr 2011 at 1:55

GoogleCodeExporter commented 9 years ago
FYI, rough benchmarking suggests newWeakInterner is 7x as fast as 
String.intern(), though I'm not sure how much of that is accounted for by that 
string copy you mention, and of course the real-life consequences are highly 
situation-dependent.

Original comment by kevinb@google.com on 6 May 2011 at 5:11

GoogleCodeExporter commented 9 years ago
(and thanks to jim.andreou for that benchmark!)

Original comment by kevinb@google.com on 6 May 2011 at 5:12

GoogleCodeExporter commented 9 years ago
Oh come on. You know my numbers aren't citable!

Anyway, *my* conclusion is that it seems safe to call the various Interner 
implementations faster than String#intern(). For another uncitable data point, 
I noticed that the difference was smaller (like 2x-2.5x) when I tried with 
strings with lots of prefix overlap, so there might be some trie hiding under 
intern() - Kevin why don't you ask your officemates about this?

Original comment by jim.andreou on 6 May 2011 at 11:08

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 11 May 2011 at 3:44

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:15

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:09