Closed pd3 closed 1 year ago
Apparently I had misremembered the public hts_parse_region
API as having the header as an optional field. Pity :/
I'm inclined to think that if we have a requirement to parse regions without a header being present, then perhaps this API should be modified to cope with it. My preference would be to avoid code duplication where possible, but I haven't really looked in detail to know how much work is involved and whether we just replace duplication in one place with it in another (eg a separate code-path for headerless parsing).
Can you give me an example of how to trigger this? I'm unable to do so, so cannot test the fix.
I tried modifying test/test-bcf-sr.pl as follows:
ubuntu@jkb-dev-pmu:~/htslib$ git diff
diff --git a/htscodecs b/htscodecs
--- a/htscodecs
+++ b/htscodecs
@@ -1 +1 @@
-Subproject commit d4aed585929e2dab9dd8e6a2b74484dfc347c0f2
+Subproject commit d4aed585929e2dab9dd8e6a2b74484dfc347c0f2-dirty
diff --git a/test/test-bcf-sr.pl b/test/test-bcf-sr.pl
index cd9859c..f65d120 100755
--- a/test/test-bcf-sr.pl
+++ b/test/test-bcf-sr.pl
@@ -119,7 +119,7 @@ sub save_vcf
open(my $fh,"| $FindBin::Bin/../bgzip -c > $fname") or error("$FindBin::Bin/../bgzip -c > $fname: !");
print $fh qq[##fileformat=VCFv4.3\n];
print $fh qq[##FILTER=<ID=PASS,Description="All filters passed">\n];
- print $fh qq[##contig=<ID=1>\n];
+ print $fh qq[##contig=<ID=1:2>\n];
print $fh qq[##contig=<ID=2>\n];
print $fh '#'. join("\t", qw(CHROM POS ID REF ALT QUAL FILTER INFO))."\n";
for my $var (@$vars)
@@ -133,7 +133,7 @@ sub save_vcf
$ref = $xref;
push @alts,$alt;
}
- print $fh join("\t", (1,100,'.',$ref,join(',',@alts),'.','.','.'))."\n";
+ print $fh join("\t", ("1:2",100,'.',$ref,join(',',@alts),'.','.','.'))."\n";
}
for my $var (@$vars)
{
@@ -146,7 +146,7 @@ sub save_vcf
$ref = $xref;
push @alts,$alt;
}
- print $fh join("\t", (1,300,'.',$ref,join(',',@alts),'.','.','.'))."\n";
+ print $fh join("\t", ("1:2",300,'.',$ref,join(',',@alts),'.','.','.'))."\n";
}
for my $var (@$vars)
{
So my files look like this:
ubuntu@jkb-dev-pmu:~/htslib$ zcat /tmp/a/1.vcf.gz
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1:2>
##contig=<ID=2>
#CHROM POS ID REF ALT QUAL FILTER INFO
1:2 100 . A T . . .
1:2 100 . A G,T . . .
1:2 300 . A T . . .
1:2 300 . A G,T . . .
2 100 . A T . . .
2 100 . A G,T . . .
Running ./test/test-bcf-sr.pl -t /tmp/a -s 0
works still. Although I see lots of errors in the output files, but it returns 0 exit status. The original also produced the same errors and also succeeded. I don't understand what's happening here in the tests. Are the failures "expected failures" and hence passes? I'm assuming so, as most checks don't redirect stderr, so the ones that do are presumably expecting an error and deliberately silencing it.
[Edit: yes, I see it now in "Failed to detect $badness: $cmd". It's specifically checking on duff input to make sure it doesn't pass. That's good. :-) ]
Is it simply that test-bcf-sr.c doesn't test the region query part of the API, so colons in names never crops up as an issue? I don't particularly like having the test for this function in bcftools, as it means we can't do CI for this code.
I just checked the code and can confirm that test-bcf-sr
does not test indexes at all. That's done properly only through the bcftools
test suite, and there are hundreds of them. If you want to test it natively in htslib, best might be to write a simplified version of vcfview.c
. Tests for this pull requests are https://github.com/samtools/bcftools/pull/1938/files.
Merge commit removed...
Note hts_parse_region() cannot be used because it requires the header and without the header the caller does not learn the contig name.
Resolves #1620