Open migrator opened 10 years ago
summary: Not Defined
Interesting... my expectation would be that a Writer implementation, given a single large String, should be able to handle it as efficiently as possible since it has all the cards so to speak. I imagine BufferedWriter to be more useful for collecting many small writes.
But it looks like regardless of how large a chunk you provide the characters to the Writer in, an OutputStreamWriter will only write out 8192 bytes at a time, so copying the large String's char array is basically just a big waste of time. It's actually really surprising to me that StreamEncoder doesn't handle this differently... it could just wrap the String in a CharBuffer and use that with CharsetEncoder without needing to copy the char array at all. Maybe there's some good reason for that I'm not aware of.
Anyway, this does look like a good potential solution.
A similar tweak may be applicable to ByteSink? (Not tried.)
This seems less likely since byte arrays don't need to go through encoding but can just be passed directly to a native method to be written.
status Not Defined creator: cgdecker@google.com created at: Sep 23, 2014
summary: Not Defined
an OutputStreamWriter will only write out 8192 bytes at a time (...)
Huh, so it does. I missed that.
byte arrays don't need to go through encoding but can just be passed directly to a native method to be written
Indeed. Sorry I should have taken a few minutes to check that.
It's actually really surprising to me that StreamEncoder doesn't handle this differently (...) Maybe there's some good reason for that I'm not aware of.
I agree. Possibly because StreamEncoder is itself a Writer, and the JDK Writers seem to follow this (undocumented?) pattern:
(1) makes some sense when there's lots of overridden and overridable methods in a deep hierarchy. (But maybe it hints of yearning for a better design?)
(2) could be concern about per-call-costs in derived classes? (multiply-wrapped Writers, synchronisation, class CarrierPidgeonIpWriter {handy in the days of NSA snooping} etc.) Or just trying to be more friendly to overridden classes?
Anyway, I'd still rate performance as a higher concern for StreamEncoder than these other things... but no point crying over Java's spilt milk, I guess. It ain't going to help ;-)
status Not Defined creator: luke.ush...@gmail.com created at: Sep 23, 2014
Hi there. I set-out to replace a little utility with:
Files.write(CharSequence, File, Charset)
But a quick microbenchmark found Guava's method to be slower on sustained writes of large Strings to individual files. (e.g. +33% longer for 16MB strings.) Results targeting HDD and SSD were similar. Surprisingly, peak memory was heavily impacted too. (I guess Java uses a big in-process write-buffer somewhere.)
Here's a tweak in CharSink which I think will give memory & speed improvements to the Files method above (and maybe others):
public void write(CharSequence charSequence) throws IOException { checkNotNull(charSequence);
In the case of OutputStreamWriter / FileWriter, this change ultimately calls into StreamEncoder.write(char[], int, int) which can use BufferedWriter's char buffer directly; rather than StreamEncoder.write(String, int, int) which makes a copy of the entire string.
The default implementation in Writer also get the same benefit when wrapped by BufferedWriter, so I think this will benefit a range of Writer classes, not just file writers.
I couldn't test the proposed tweak directly, so here's my actual tests which appear to support the theory:
Harness:
With these implementations of doWrite:
Test 1: 44ms per file, 1590 MB peak memory**
Test 2: 44ms per file, 1590 MB peak memory
Test 3: 32ms per file, 366 MB peak memory
Notes:
\ memory = "Peak Private Bytes" according to SysInternal's Process Explorer.
relevance: 2