svdb --query is refusing to annotate the vcf if the annotations for --in_occ or --in_frq are missing (in this case AC and AF) for some variants in the database, producing an output vcf without any variants.
It works if I remove lines without AC / AF from the gnomad file, but that means we remove this gnomAD information. Would it make sense to somehow integrate this CNV information for the annotation of SVs instead?
Command used and terminal output
NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_STRUCTURAL_VARIANTS:SVDB_QUERY_DB command.sh:
#!/bin/bash -euo pipefail
svdb \
--merge \
--pass_only --same_order \
--priority tiddit,manta,cnvnator \
--vcf NA12878_tiddit.vcf.gz:tiddit NA12878_manta.diploid_sv.vcf.gz:manta NA12878_cnvnator.vcf.gz:cnvnator \
> NA12878_sv.vcf
bgzip NA12878_sv.vcf
cat <<-END_VERSIONS > versions.yml
"NFCORE_RAREDISEASE:RAREDISEASE:CALL_STRUCTURAL_VARIANTS:SVDB_MERGE":
svdb: $( echo $(svdb) | head -1 | sed 's/usage: SVDB-\([0-9]\.[0-9]\.[0-9]\).*/\1/' )
samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
END_VERSIONS
--------
Output (command.log):
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Error: frequency or hit tag not found! Make sure to set the --in_occ AND --in_frq to the number and frequency of alleles/individuals as presented in the INFO column of the input db
database variants not having the --in_occ or --in_frq tag must be removed
you may also skip these parameters and cluster based on the GT entry of the format column (if such exists)
I have submitted an issue in the SVDB repository @fa2k. You can find it at this link: https://github.com/J35P312/SVDB/issues/74. Let's continue the discussion there.
Description of the bug
gnomAD SV v4.1 (https://gnomad.broadinstitute.org/news/2023-11-v4-structural-variants/) contains some CNVs that don't have AC or AF information in the vcf (gnomad.v4.1.sv.sites.vcf.gz).
svdb --query is refusing to annotate the vcf if the annotations for --in_occ or --in_frq are missing (in this case AC and AF) for some variants in the database, producing an output vcf without any variants.
It works if I remove lines without AC / AF from the gnomad file, but that means we remove this gnomAD information. Would it make sense to somehow integrate this CNV information for the annotation of SVs instead?
Command used and terminal output
Relevant files
No response
System information
No response