pangenome / pggb

the pangenome graph builder
https://doi.org/10.1101/2023.04.05.535718
MIT License
355 stars 38 forks source link

vg deconstruct error #344

Closed arslan9732 closed 10 months ago

arslan9732 commented 10 months ago

Hi, I am running pggb with this command:

pggb -i ../Genomes/community/chr04.fa.gz \
     -o chr04 \
     -D chr04_tmp \
     --keep-temp-files \
     -n 11 \
     -t 200 \
     -p 90 \
     -s 5k \
     -V 'ref:1000' \
    --multiqc -S -r

But it gave this error:

Error [vg deconstruct]: No specified reference path or prefix found in graph
Command exited with non-zero status 1
vg deconstruct -P ref -H 1000 -e -a -t 200 chr04/chr04.fa.gz.3907ad6.417fcdf.c00ce9c.smooth.final.gfa

The fasta file headers are like this:

>Genome1#1#chr04
>Genome2#1#chr04
>Genome3#1#chr04
...
>Genome11#1#chr04
AndreaGuarracino commented 10 months ago

You have to specify the prefix of the genome you want to use as reference. For example, if Genome1#1#chr04 is your reference, you have to write -V Genome1:1000


pggb -i ../Genomes/community/chr04.fa.gz \
     -o chr04 \
     -D chr04_tmp \
     --keep-temp-files \
     -n 11 \
     -t 200 \
     -p 90 \
     -s 5k \
     -V 'Genome1:1000' \
    --multiqc -S -r
arslan9732 commented 10 months ago

I ran it with your suggestion but it gave an empty vcf (except header lines), and didn't produce any error. Although it produced other files which seem to be fine.

subwaystation commented 10 months ago

Could you please post the name of your reference sequence in the graph and the actual command line?

arslan9732 commented 10 months ago

I rerun it with 10 genomes (one genome is sequenced by short reads and the output graph is weird so I removed it). But when I look into the log file, it is stuck at vg deconstruct step.

odgi sort -P -p Ygs --temp-dir chr01-10_tmp -t 200 -i - -o chr01-10/chr01-10.fa.gz.3907ad6.417fcdf.c00ce9c.smooth.final.og
75460.47s user 1748.54s system 4510% cpu 1711.82s total 36418000Kb max memory
odgi view -i chr01-10/chr01-10.fa.gz.3907ad6.417fcdf.c00ce9c.smooth.final.og -g
161.22s user 85.18s system 98% cpu 251.08s total 15773864Kb max memory
Warning [vg deconstruct]: -H is deprecated, and will be ignored

libgomp:
libgomp: Thread creation failed: Resource temporarily unavailable

libgomp:
libgomp: Thread creation failed: Resource temporarily unavailable
Thread creation failed: Resource temporarily unavailable
Thread creation failed: Resource temporarily unavailable

libgomp: Thread creation failed: Resource temporarily unavailable

libgomp:
libgomp: Thread creation failed: Resource temporarily unavailable
Thread creation failed: Resource temporarily unavailable

libgomp: Thread creation failed: Resource temporarily unavailable

libgomp:
libgomp:
libgomp:
libgomp:
libgomp: Thread creation failed: Resource temporarily unavailable

libgomp:
libgomp: Thread creation failed: Resource temporarily unavailableThread creation failed: Resource temporarily unavailable

libgomp:
libgomp:
libgomp: Thread creation failed: Resource temporarily unavailable

This time the command was:

pggb -i ../Genomes/community/chr01-10.fa.gz \
     -o chr01-10 \
     -D chr01-10_tmp \
     --keep-temp-files \
     -n 11 \ #it should be 10 mistakenly running with 11
     -t ${NSLOTS} \
     -p 90 \
     -s 5k \
     -V 'Genome1:1000' \ #Genome1 as reference
    --multiqc -S -r
arslan9732 commented 10 months ago

I am able to solve the issue. It was stuck at vg deconstruct step and gave thread creation failed error. I separately ran the vg deconstruct step and again got the same error. Actually it was a memory error. So I increased the memory and it ran successfully.