samtools / htslib

C library for high-throughput sequencing data formats
Other
812 stars 447 forks source link

Duplicate Sample IDs in VCF aborts #431

Open david4096 opened 8 years ago

david4096 commented 8 years ago

(via pysam) A VCF that has repeated sample names causes htslib to crash. An offending VCF can be found here: https://github.com/ga4gh/server/tree/master/tests/faultydata/variants/duplicated_sampleid .

As described here: https://github.com/ga4gh/server/issues/521

jmarshall commented 8 years ago

bcf_hdr_add_sample() aborts on certain errors, which some people misinterpret as crashing. Being a library function, it should really be returning an error code so the caller can decide what to do about the problem. (Note that if hts_verbose < 2 it does just return an error code in this case — this is clearly wrong as such behaviour should never depend on the verbosity setting.)

Fellow maintainers: Is there a reason for aborting here (would the resulting data structures be too broken to continue?), or is this just a legacy of the early htslib code?

kirkmcclure commented 7 years ago

Please note that the abort() command was intentionally added to resolve bcftools issue 184 ("Wrong genotypes if duplicate sample names").

So, it is not appropriate to simply remove it.

jkbonfield commented 7 years ago

Abort is never an appropriate action for a library call unless there is genuingly nothing else it could do or if the situation should really never occur (in which case if we reached it then something very odd has happened, possibly memory corruption or some foul play is at hand and continuing could be even more problematic).

This situation is simply a matter of reporting an error and returning with the correct error code. I haven't looked into seeing how easy that is to do in this code path.

pd3 commented 7 years ago

Just to confirm (as it was me who added the abort) - a proper error propagation is preferred indeed. However, if the abort is going to be removed, it would be great if bcftools could be made aware of the change.