vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.08k stars 191 forks source link

giraffe: Haplotype::prefix() error when running haplotype sampling / personalization feature #4175

Closed alshai closed 7 months ago

alshai commented 7 months ago

1. What were you trying to do?

I'm trying to follow the instructions for personalized vg giraffe alignment here, using the HPRC v1.1 index with their provided .hapl file here

Specifically, I ran these two commands with some slight variations from the orignal:

export TMPDIR=/scratch/tmp
kmc -k29 -m128 -okff -t16 -hp <fq> ${TMPDIR}/sample $TMPDIR
vg giraffe -p -t 16 -Z <hprc GBZ> --haplotype-name <hprc hapl> --kmer-name ${TMPDIR}/sample.kff \
    -N sample -f <fq> -o BAM > out.bam

2. What did you want to happen?

Expected the command to run smoothly with results as expected.

3. What actually happened?

Seems like it's failing when constructing the GBWT. I got the following error:

Sampling haplotypes
Loading GBZ from hprc-v1.1-mc-grch38.d9.gbz
Loading haplotype information from hprc-v1.1-mc-grch38.hapl
Reading kmer counts
Read the kmer counts in 228.218 seconds
Estimating kmer coverage
Estimated kmer coverage in 8.83672 seconds
Building GBWT
Running 16 GBWT construction jobs in parallel
error: [job 0]: Haplotype::prefix(): GBWT sequence 1424 did not reach GBWT node 8956251
error: [job 15]: error: [job 11error: [job error: [job Haplotype::prefix(): GBWT sequence 30685 did not reach GBWT node 9926125914]: Haplotype::prefix(): GBWT sequence 29648 did not reach GBWT node 87174858

12]: ]: Haplotype::prefix(): GBWT sequence 21669 did not reach GBWT node 78770269
Haplotype::prefix(): GBWT sequence 25431 did not reach GBWT node 80793658

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?

see (1)

6. What does running vg version say?

version v1.51.0 "Quellenhof"
jltsiren commented 7 months ago

You are using the frequency-filtered graph hprc-v1.1-mc-grch38.d9.gbz, but the haplotype information is for the default graph hprc-v1.1-mc-grch38.gbz. If you want to use haplotype sampling, you should use the default graph.

alshai commented 7 months ago

Ah didn't realize it was a simple mistake. Thanks for the speedy reply