samtools / htslib

C library for high-throughput sequencing data formats
Other
801 stars 446 forks source link

Fix a remaining endianness bug in bcf_format_gt() #1495

Closed jmarshall closed 2 years ago

jmarshall commented 2 years ago

I noticed that post PRs #459 and #1023, all instances of #define BRANCH(…) macros in vcf.c have a convert argument that is one of le_to_i8 et al, and use it to decode the VCF data structures correctly on big-endian architectures.

However I noticed that there is one instance of #define BRANCH(…) in htslib/vcf.h (in bcf_format_gt()) and it has not had such treatment. It is probably unusual for the number of alleles to go beyond BCF_BT_INT8, but if it does sure enough this failed on big-endian architectures.

This PR fixes this function by adding a convert() parameter to this last convert-less BRANCH-style macro, similar to those previously added to all the BRANCH-style macros in vcf.c and vcfutils.c. Fixes the VCF printing of records with more alleles than fits in BCF_BT_INT8.

It also adds a record with GT values >256 to test_bcf2vcf's VCF file, and regenerates the corresponding BCF file via

test/test_view -b -l0 -p test/tabix/vcf_file.bcf test/tabix/vcf_file.vcf

Without the fix, the record added to _vcffile.vcf is printed on s390x (a big-endian architecture) as:

$ ./htsfile-1.15.1-72-g8f140ee -c test/tabix/vcf_file.vcf | tail -1
4   3258501 .   C   A,[…],CCTGT 45  PASS    AN=4;AC=2   GT  255/11520   -3841/1280
daviesrob commented 2 years ago

Thanks. I've tested on my SPARC VM and the fix works nicely.