1. What were you trying to do?
I am currently working on constructing a pan-genome using the vg autoindex workflow with the Giraffe process. The input.sv.vcf.gz file includes data from 182 individuals, containing a total of 3.8 million SVs. In the initial run of vg autoindex --workflow giraffe, and subsequent SV genotyping using vg call (-a -z -A), only a fraction of SVs were successfully genotyped. To increase SV inclusion in the graph, we modified the genotypes of each SV in each individual to 0|1, 1|1, where the majority were originally denoted as .|.. Subsequently, I rerun autoindex and genotype structural variants (SVs) using vg giraffe, vg pack -Q 5 and vg call (-a -z -A).
2. What did you want to happen?
I expected the SV genotyping results (output.sv.vcf) to be consistent with the input SV data (input.sv.vcf.gz) after deconstructing multi-allelic SVs using bcftools norm -m -any.
3. What actually happened?
After the second attempt, while the SV count increased (around 1.8 million), it is still noticeably lower compared to the original input.sv.vcf.gz file.
Additionally, we attempted to follow the example provided in https://github.com/vgteam/vg/wiki/Construction-Examples, specifically using vg construct (-a -S -R -C) to construct each chromosome individually. However, we encountered an issue with longer chromosomes, receiving the error "vg.Graph exceeded the maximum protobuf size of 2GB: 4415154172." (#3979,#4152)
4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:
Place stacktrace here.
5. What data and command can the vg dev team use to make the problem happen?
We modified the genotypes of each SV in each individual to 0|1, 1|1 in the input.sv.vcf.gz file following:
for chr in ${chrs[*]}
do
echo $(date +"%Y-%m-%d %T") start convert ${chr}
awk -v OFS="\t" '
BEGIN {FS=OFS; srand()}
{
if (/^#/) {
print;
} else {
# arbitrarily assign ./. to 0|1 1|1
for (i=10; i<=NF; i++) {
if ($i == ".") {
$i = (int(rand() * 2) == 0) ? "0/1" : "1/1";
}
gsub(/\//, "|", $i);
}
print;
}
}' inputs.${chr}_sv.vcf > convert.inputs.${chr}_sv.vcf
done
vg version v1.53.0 "Valmontone"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by anovak@courtyard.gi.ucsc.edu
1. What were you trying to do? I am currently working on constructing a pan-genome using the vg autoindex workflow with the Giraffe process. The input.sv.vcf.gz file includes data from 182 individuals, containing a total of 3.8 million SVs. In the initial run of vg autoindex --workflow giraffe, and subsequent SV genotyping using vg call (-a -z -A), only a fraction of SVs were successfully genotyped. To increase SV inclusion in the graph, we modified the genotypes of each SV in each individual to 0|1, 1|1, where the majority were originally denoted as .|.. Subsequently, I rerun autoindex and genotype structural variants (SVs) using vg giraffe, vg pack -Q 5 and vg call (-a -z -A).
2. What did you want to happen? I expected the SV genotyping results (output.sv.vcf) to be consistent with the input SV data (input.sv.vcf.gz) after deconstructing multi-allelic SVs using bcftools norm -m -any.
3. What actually happened? After the second attempt, while the SV count increased (around 1.8 million), it is still noticeably lower compared to the original input.sv.vcf.gz file. Additionally, we attempted to follow the example provided in https://github.com/vgteam/vg/wiki/Construction-Examples, specifically using vg construct (-a -S -R -C) to construct each chromosome individually. However, we encountered an issue with longer chromosomes, receiving the error "vg.Graph exceeded the maximum protobuf size of 2GB: 4415154172." (#3979,#4152)
4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:5. What data and command can the vg dev team use to make the problem happen? We modified the genotypes of each SV in each individual to 0|1, 1|1 in the input.sv.vcf.gz file following:
We run autoindex --workflow giraffe following:
We first deconstruct graph and compared with the original input.sv.vcf.gz file. We found some SVs were lost in input.sv.vcf.gz file.
6. What does running
vg version
say?