Open lczech opened 4 years ago
Addendum: For other types of missing fields, the errors get even more confusing. The above happens when calling rbt vcf-to-txt -g --fmt DP
without having the format field DP
in the vcf file.
However, if instead the AD
field is ill formatted, I get a rust kernel panic:
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/bcf/to_txt.rs:212:51
The behaviour can be reproduced with these files (produced by freebayes), and calling
cat bcftools-view.bcf | rbt vcf-to-txt -g --fmt DP AD --info ANN
using rust-bio-tools =0.6.0, as well as rust-bio-tools =0.10.0
The field is present in the file, so I don't know exactly what is causing the issue here... Would be nice to have a fix for that, because otherwise, I cannot use rbt with freebayes.
By the way, these errors occurred while adapting the dna-seq-gatk-variant-calling workflow to other types of callers (bcftools call, and freebayes), which seem to produce vcf files that confuse rbt. The error occurs in the stats rule for converting vcf to txt.
All the best and thanks in advance Lucas
Thanks a lot for reporting. Indeed, error messages should be improved here. This is now much easier with the thiserror and anyhow crates. I currently don't have the capacity, but it is certainly something we would either be happy to get a PR on, or that we will start looking at as soon as possible.
Hi @lczech,
I tried to reproduce your error message (there is no flag type for format
) by using the test file in tests/test.vcf
and running the command rbt vcf-to-txt -g --fmt DP
. However, rbt
now prints the missing DP
tag as empty instead of spitting out the error you described. Can you test on your file that emits this error and see if the fix helps in debugging or not? Thank you.
I can reproduce the error by supplying a INFO
tag to --fmt
, e.g. rbt vcf-to-txt --fmt T < tests/test.vcf
where tag "T" is in the INFO
field. As @lczech noted, the error message might not be specific to these instances.
I made a pull request (#243) to better inform the users of the error that occurred (at least now the offending tag is printed out instead of a generic message) and suggested the users to check their inputs. Hopefully @johanneskoester can chime in and give their view on this issue.
On the error about the "AD" tag, I just found out the reason why rust-bio-tools
panicked.
The "AD" tag is defined as ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Number of observation for each allele">
. Number=R
informs the parsing library (rust-htslib
) to expect two values in "AD".
However, the samples with missing values in vcf files generated from freebayes
are encoded with .:.:.:.:.:.:.:.
instead of .:.,.:.:.:.:.:.:.
. The error thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/bcf/to_txt.rs:212:51
is caused by rbt
trying to access this non-existent second value.
Here is the file successfully parsed after changing .:.:.:.:.:.:.:.
to .:.,.:.:.:.:.:.:.
:
VARIANT VARIANT VARIANT VARIANT VARIANT A B
CHROM POS REF ALT QUAL AD AD
21 5038037 T C 0.44 1 2
21 5038042 A C 13.59 1 2
21 5082586 G A 0.25 -2147483648 2
21 5118139 T C 0.99 -2147483648 2
21 5119448 A C 5.96 1 2
...
Maybe if the tag length is Alleles
and an out of bounds error occurred, we can assume this is a missing value? Or maybe the upstream rust-htslib
can be made to interpret a single missing value to represent the missing values for the available alleles in cases with Number=R
?
As a side note, I noticed that the value for the "AD" tag is filled with the last allele value in each sample. Looking at the code, assume that this is intended? Also, the missing value is encoded as -2147483648
, but I thought it should be encoded as (a blank space). Is this intended as well?
Hi there, I'd suggest to improve some error messages:
https://github.com/rust-bio/rust-bio-tools/blob/d2a77501aaf058d8c2ae0ee56c2fc352617b4783/src/bcf/to_txt.rs#L205
The error simply states that
and it took me quite a while digging through rust-htslib to figure out that it actually wants to say something like
where "XX" is some format annotation shortcut for vcf files. The same probably applies for the info fields as well.
I'd provide a PR, but I don't know rust well enough to gauge whether the error message can occur in other circumstances as well, where my improved message would not be fitting.
All the best Lucas