Open david4096 opened 8 years ago
bcf_hdr_add_sample()
aborts on certain errors, which some people misinterpret as crashing. Being a library function, it should really be returning an error code so the caller can decide what to do about the problem. (Note that if hts_verbose < 2
it does just return an error code in this case — this is clearly wrong as such behaviour should never depend on the verbosity setting.)
Fellow maintainers: Is there a reason for aborting here (would the resulting data structures be too broken to continue?), or is this just a legacy of the early htslib code?
Please note that the abort() command was intentionally added to resolve bcftools issue 184 ("Wrong genotypes if duplicate sample names").
So, it is not appropriate to simply remove it.
Abort is never an appropriate action for a library call unless there is genuingly nothing else it could do or if the situation should really never occur (in which case if we reached it then something very odd has happened, possibly memory corruption or some foul play is at hand and continuing could be even more problematic).
This situation is simply a matter of reporting an error and returning with the correct error code. I haven't looked into seeing how easy that is to do in this code path.
Just to confirm (as it was me who added the abort) - a proper error propagation is preferred indeed. However, if the abort is going to be removed, it would be great if bcftools could be made aware of the change.
(via pysam) A VCF that has repeated sample names causes htslib to crash. An offending VCF can be found here: https://github.com/ga4gh/server/tree/master/tests/faultydata/variants/duplicated_sampleid .
As described here: https://github.com/ga4gh/server/issues/521