Closed jmarshall closed 1 year ago
As noted on the previous htslib PR:
In particular, there are some I/O errors that are not reported by the kernel until you call
close(2)
. So to fully check for errors insome-htslib-using-command args > output.bam
you need to check the return value ofhclose()
and it needs to reflect the return value ofclose(2)
even for stdout. Otherwise you risk undetected corruption in output.bam.
Alternatively, you need to explicitly close(STDOUT_FILENO)
(or an equivalent) separately from hclose()
, as described in the initial comment and demonstrated for htsfile.c. See also samtools/samtools#1909 which adds such a check for samtools. If this approach is accepted, I will prepare a similar PR for bcftools.
Having hclose()
et al not actually close stdout and concentrating really closing it in one place at the end of samtools' and bcftools' main()
functions will make life easier for pysam too.
(However I have a plan to rewrite pysam's I/O redirection in an upcoming release so that it will be immune to whether samtools/bcftools closes these file descriptors or not anyway.)
I like this as a change. It does feel more natural to me, as as you state if the file wasn't opened by htslib then maybe its going too far for it to be doing the closing.
Also given we're paranoid and use fdatasync / fsync already, I expect even if people don't explicitly do a close-and-check themselves then almost all errors will have been spotted already (eg ENOSPC).
Hmmm… interesting point about fsync
. POSIX is extremely vague about the circumstances in which close()
might return EIO: “An I/O error occurred while reading from or writing to the file system”. MacOS's BSD-heritage man page says “A previously-uncommitted write(2) encountered an input/output error” which is slightly more helpful.
You may also be amused to hear that POSIX and Linux have opposite answers to “what happens if during dup2()
, the as-if-by-close
closing of the previous dest fd fails due to an I/O error?”…
Thinking on things that can go wrong - devil's advocate if you wish. (I do infact like the idea in this PR)
while ((r = sam_read(in, h, b)) >= 0) {
sam_write(out, h, b);
n++;
}
hts_close(out);
printf("Wrote %d records\n", n);
This would previously report the number of records written when given a file, or when given stdout it'd just lose the printf as stdout had already been closed. Now it wouldn't get closed and we'd have chit-chat appearing in the output file instead. This is quite similar to the occasional nohup bugs people hit due to mixing in stderr when using > file
instead of -o file
.
I'm not hugely concerned as I'd argue this sort of tooling is broken and also trivial to fix, but I wouldn't be surprised if this class of bug also exists in the wild too.
fdatasync
, so they may actually be a genuine source of data loss that we can no longer verify. Eg:cat in.sam | strace -o tr ./test/test_view - | cat > /dev/null
(Note this also hints that we maybe should fix test_view
too to have an explicit close.)
The strace output shows no close of stdin (harmless) or stdout (more serious). It ends with:
read(0, "", 4096) = 0
write(1, "H7JKIB>ACCBF,AF@?BED>C6C1\tX0:i:1"..., 3046) = 3046
fdatasync(1) = -1 EINVAL (Invalid argument)
exit_group(0) = ?
So we see the issue with fdatasync not working here. In this regard, perhaps close would spot errors. Then again it's possible they'd just got lost in exactly the same manner. We're checking our writes, but it's the internals of the kernel which matter here. So we've sent some data down a pipe or socket, which succeeded at that point - we put it in the "todo" buffer basically - but later is found to fail. It's not something that is trivial to test either.
Given how we prefer -o file
over > file
, really pipes should be the primary source of writing to stdout.
I also considered adding another option to override it, e.g., hts_open("-", "wO")
(O for owned / one time) would make hts_close()
really close stdout. Or leaving closing as the default, so that applications would have to write hts_open("-", "wS")
to get the new behaviour.
However it all seemed a bit complicated and unclean. I too would think that a program like you describe is broken.
I always vaguely assumed the startup code would close stdin/out/err, but I see it just leaves It to exit_group
to close them all. It wouldn't know what to do to report failures, so I guess there'd be no point in closing them explicitly.
I've added an fclose
to _testview.
Re pipes, there's no delayed write to an underlying device going on, so I'd be surprised if close
ever reported an error on a pipe. The failure mode would be more along the lines of writing to another process that abended prematurely — and you'd expect to see diagnostics (either from the program itself or whichever shell spawned it) about the other process's unfortunate end. (Or quite likely you'd still be writing to the pipe, so you'd get to emit a diagnostic about the EPIPE too.)
In any case, by adding explicit closes at the end of main()
in samtools and bcftools, things are equivalent to before this change — at least for those two programs.
We seem to be in agreement on the various points, including what is valid and what is broken code that we need not support. Thanks for clarification on the pipe case too.
Agreed also on trying to override this with extra hts_open
letters is unnecessary complication. Besides if it turns out to be a something required then it can still be done later. I'll move on to reviewing the samtools PR. Thanks
Revisiting the
stdin
/stdout
/dup
conversation from #1658:Reconsidering this… In fact it has caused some problems over the years: htsfile.c needed some circumlocutions because it too wants to write several SAM/VCF (i.e. textual)
htsFile
streams tostdout
, and pysam has had trouble due tostdout
being closed by samtools and bcftools. Moreover, as stdin and stdout are already open and hence are not opened byhts_open
/hopen
, it's not really morally right for them to be closed byhts_close
/hclose
.Hence this pull request, which adds a
mode
option letter tohdopen()
to signal that the fd should not be closed byhclose()
, and uses it inhopen_fd_stdinout()
which underlieshopen("-", …)
.This enables repeated
hopen("-")
/hclose()
/hopen("-")
where previously the underlyingSTDIN
/OUT_FILENO
would have been prematurely closed. This will mean that the linked bgzip.c pull request would not need to treat"-"
specially.This also means that stdout is never really closed and
hclose()
's return value does not reflect closing the underlying fd. Hence particularly paranoid programs that have written significant data to stdout will want to close and check it themselves just before the end of main():(Ignoring EBADF as stdout may already have been closed and checked, e.g. if the program has been linked to an earlier HTSlib where
hclose()
still closes STDOUT_FILENO.)The second commit simplifies htsfile.c's file opening and adds a final
fclose(stdout)
check accordingly.