Closed mmokrejs closed 9 months ago
I don't think there is a bug, probably just a different expectation / misunderstanding.
From the usage page:
bcftools csq
-c, --custom-tag STRING Use this tag instead of the default BCSQ
bcftools +split-vep
-a, --annotation STR INFO annotation to parse [CSQ]
and from the documentation (https://samtools.github.io/bcftools/bcftools.html#csq):
The output VCF is annotated with INFO/BCSQ and FORMAT/BCSQ tag (configurable with the -c option). The latter is a bitmask of indexes to INFO/BCSQ, with interleaved haplotypes. See the usage examples below for using the %TBCSQ converter in query for extracting a more human readable form from this bitmask.
Hmm, so the value under CSQ
is definitely different than BCSQ
, fine. Then the sed
is definitely not fixing the issue, because the data are still bitmasks. But what should I do as a user? I am lost. I asked +split-vep
to give me a TSV output albeit with some fields separated by commas and some by pipe symbols.
Unfortunately, I don't understand what the problem is. Reading the documentation and a little bit of experimentation should hopefully do.
As shown above, both commands have an option to override the default tag name. The FORMAT/BCSQ is a bitmask and the query
command can be used to convert to a human readable form, as described in the documentation. The +split-vep
usage page shows quite a few examples and links to more, I believe that should be enough to get one started.
I attribute this to a late Friday tiredness :)
So from our off-line discussion now it is clearer to me. https://samtools.github.io/bcftools/howtos/plugin.split-vep.html anticipates CSQ
which is normally output by VEP
. To directly consume bcftools csq
output one needs to use bcftools +split-vep -a BCSQ
. Alternatively bcftoos csq -c CSQ
must be specified and that I discovered already before reporting. Still I did not understand why the tools do not work together. Either of the two extra cmdline switches is needed.
As you wrote in an email, if you you make split-vep
able to search for BCSQ
if CSQ
is not present, it would have solved my issue. Although I understand how it evolved I think it would be great to fix that. And the docs say nothing about this "incompatibility": https://samtools.github.io/bcftools/howtos/plugin.split-vep.html
Hi Peter, it seems to me some native bcftools tools or its plugins require
CSQ
while it seems you switched toBCSQ
instead. Would you mind checking all the tools are in sync with each other? I used initially 1.17 but also 1.19 behaves same. Why isTBCSQ
existing too? Where is the documentation?I have to add
sed -e 's/BCSQ/CSQ/g'
in between the pipes.Still,
bcftools +split-vep
discards most of the input lines, dunno why. Only two lines are output. But that is a different bug. Edit: That was actuallybcftools csq
not annotating most of the lines withBCSQ
, fixed for me bybcftools csq -s -
, thanks to https://github.com/samtools/bcftools/issues/2070#issuecomment-1881468481.