Closed d-cameron closed 4 years ago
Merging #1497 into master will decrease coverage by
0.020%
. The diff coverage is100.000%
.
@@ Coverage Diff @@
## master #1497 +/- ##
===============================================
- Coverage 69.279% 69.260% -0.020%
- Complexity 8764 8796 +32
===============================================
Files 590 592 +2
Lines 34755 34863 +108
Branches 5800 5814 +14
===============================================
+ Hits 24078 24146 +68
- Misses 8386 8416 +30
- Partials 2291 2301 +10
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
...n/java/htsjdk/samtools/fastq/BasicFastqWriter.java | 76.000% <100.000%> (+26.000%) |
9.000 <2.000> (+4.000) |
|
...htsjdk/samtools/util/nio/DeleteOnExitPathHook.java | 80.952% <0.000%> (-9.524%) |
3.000% <0.000%> (-1.000%) |
|
src/main/java/htsjdk/io/IOPath.java | 79.310% <0.000%> (ø) |
14.000% <0.000%> (?%) |
|
src/main/java/htsjdk/io/HtsjdkPath.java | 51.316% <0.000%> (ø) |
15.000% <0.000%> (?%) |
@lbergelson is this result actually a blocker for merging?
@d-cameron Don't worry about the coverage. I need to disable that. The coverage is extremely twitchy in it's complaints.
I'm not sure this is going to help at all because PrintWriter flushes on every write anyway. As far as I can tell it's a pathological class that we should get rid of. (Bad default flushing behavior, but even more critically it's error handling is atrocious.) If you want it to stop flushing on every output (which we do want, ) we should configure it with autoflush = off when creating it. Just removing the checkError call makes us vulnerable to errors without actually fixing the problem I think.
I would be in favor of just replacing the internal PrintStream with a different output stream that doesn't behave so badly.
PrintWriter flushes on every write anyway ... configure it with autoflush = off
But autoFlush=false is the default. I've added a test case to show flushing of the underlying stream is fixed with this PR.
Isn't deferring IO errors to when the underlying stream is flushed/closed normal, expected behaviour?
I would be in favor of just replacing the internal PrintStream with a different output stream that doesn't behave so badly.
That would mean changing the BasicFastqWriter API since of it's two constructors take PrintStreams which would be a breaking change (I know it'd break my GRIDSS code). Did the issue of what actually composes the public htsjdk API being undefined ever get resolved?
@d-cameron Oh wow, you're totally right about autoflush being off by default. I've been wrong about this for a while I guess and giving incorrect advice about it. Thanks for pointing that out. I'm not sure about deferring errors being standard practice? I would expect it to defer the error until the write actually is attempted and fails, but typically that will be happening many times as the write buffer fills up, not just on closing the writer.
Yes, that would be a breaking change unless we kept the old methods around and deprecated them. I'm not sure it's practical to change it, I just would like to. In general I treat the entire accessible java api as the public API and attempt to avoid breaking changes to any of it. Sometimes that isn't possible, but usually we deprecate things for a long period of time before changing them. There have been some major exceptions though. The cram code was dramatically altered, but we gave a lot of notice about that. I'm also more willing to break newly added API's. Things that were put in in a recent release are less likely to have a lot people using and the users are actively adopting new features so I change those more readily. Things like minor API fixes in the Gff3 classes have been breaking changes on recent code. Old public APIs I try hard to never break without long term deprecation before the break. Public methods are the most sacred. I also attempt to avoid any breaking changes to anything potentially accessible to user code. (i.e. protected fields in non-final classes, etc. )
We're planning to label things better as part of the CZI grant. I want to introduce the notion of beta/unstable api's that we are willing to break because it's extremely painful to make progress on a lot of things now, and we're very slow to add new code partially because we are extremely resistant to breaking existing code.
@d-cameron Thank you for fixing this and also for pointing out my misconception about PrintStream!
Description
PrintStream.checkError()
is called on every write(). Since this flushes the underlying stream, performance when writing fastq files is very poor.This change defers the call to
checkError()
toflush()
andclose()
operations.