Closed zhqduan closed 1 year ago
I smell that a thread in smoothxg
is waiting for nothing.
Are these runs long? Have you already tried to repeat pggb
for those chromosomes? and/or update the whole pggb
and its tools?
If the problem is deterministic, it would be interesting to put our hands on one of your inputs, if possible and doable.
Thanks for the prompt reply!
I have run the pggb for those chromosomes many times, and the results are same. The versions of each tools I use in pggb are listed as followed: wfmash: v0.9.1-3-gc5882a1 seqwish: v0.7.6 smoothxg: v0.6.5 odgi: version v0.7.3 "Fissaggio" gfaffix: 0.1.3 vg: version v1.40.0 "Suardi"
Best, Zhongqu
Could you please share the smallest graph (the GFA file) that causes smoothxg
to hang? If you use zstd
for compressing it, the resulting file could not be too big.
OK, I will try to compress a gfa file according to your advice.
Alternatively, I will update the smoothxg to version 0.6.7 and rerun the pggb command on the three chromosomes.
Thank you very much!
Was this resolved? What was the resolution?
On Sun, Nov 27, 2022, 11:31 Andrea Guarracino @.***> wrote:
Closed #245 https://github.com/pangenome/pggb/issues/245 as completed.
— Reply to this email directly, view it on GitHub https://github.com/pangenome/pggb/issues/245#event-7897917946, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEJRYOSNMQGB4ZG7FNDWKOLH3ANCNFSM6AAAAAARWX5HU4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@zhqduan, how did you install pggb
? bioconda? docker/singularity?
Did you have a chance to compress the GFA to share it?
Sorry for late reply. I am testing the pggb on the chr16 of HPRC samples. But it still not finish after more than 10 days.
I install pggb with bioconda and the versions of tools listed as followed: wfmash: v0.9.1-3-gc5882a1 seqwish: v0.7.6 smoothxg: v0.6.7 odgi: version v0.7.3 "Fissaggio" gfaffix: 0.1.3 vg: version v1.40.0 "Suardi"
The gfa.gz file of chr16 for all HPRC samples is shared with the link (930MB): https://drive.google.com/file/d/1VFJrN20PcKBY7LbwlNfcCKZi4Xtbxggy/view?usp=sharing
If you could not access the file, please let me know. Thank you very much!
@zhqduan, thank you for the file!
The fact that you are using bioconda
can explain the high execution times, as you are not fully exploiting your system's hardware with that installation. Can you build pggb
's tools from the source or build the docker image locally? See here for more details about this performance problem and how to solve it.
In any case, I am smoothing your graph using the last smoothxg
version, built from source (in this way), to check if it is just a compilation problem or the graph is triggering blocking bugs. We'll know more in several hours (or days).
@AndreaGuarracino Sincerely appreciate for your kindly reply. I will build the pggb tools from source and rerun the scripts. Thank you very much!
@zhqduan, were you able to obtain a graph by building smoothxg
from source?
I can confirm that it is possible to get a graph in that way, but a few blocks need a massive amount of time to be aligned with SPOA. So, there is no 'infinite-bug', but those hard blocks are surely a problem that we need to tackle differently.
Hi, @AndreaGuarracino I have obtain a graph from chromosome Y. However, chr1 and chr16 are still run at the smoothxg step [smoothxg::(1-3)::smooth_and_lace] applying local SPOA to 674363 blocks: 100.00% @ 5.99e+00/s elapsed: 01:07:17:21 remain: 00:00:00
.
Thank you very much!
Update
The jobs of chr1 and chr16 are both interrupted by the error info:
chr1:
[smoothxg::(1-3)::smooth_and_lace] applying local SPOA to 1833212 blocks: 100.00% @ 2.33e+01/s elapsed: 00:21:52:59 remain: 00:00:00:00Command terminated by signal 9
smoothxg -t 32 -T 16 -g chr1.pan/chr1.pan.fa.cb6f6f4.04f1c29.seqwish.gfa -r 96 --base chr1.pan --chop-to 100 -I .9800 -R 0 -j 0 -e 0 -l 700,900,1100 -P 1,19,39,3,81,1 -O 0.03 -Y 9600 -d 0 -D 0 -S -Q Consensus_ -V -o chr1.pan/chr1.pan.fa.cb6f6f4.04f1c29.6a71824.smooth.gfa`
chr16:
[smoothxg::(1-3)::smooth_and_lace] applying local SPOA to 674363 blocks: 100.00% @ 1.68e+00/s elapsed: 04:15:26:41 remain: 00:00:00:01Command terminated by signal 9
smoothxg -t 32 -T 16 -g chr16.pan/chr16.pan.fa.cb6f6f4.04f1c29.seqwish.gfa -r 96 --base chr16.pan --chop-to 100 -I .9800 -R 0 -j 0 -e 0 -l 700,900,1100 -P 1,19,39,3,81,1 -O 0.03 -Y 9600 -d 0 -D 0 -S -Q Consensus_ -V -o chr16.pan/chr16.pan.fa.cb6f6f4.04f1c29.6a71824.smooth.gfa`
The command and parameters listed as followed:
Command: ~/softwares/pggb_v0.5.1_d20221201/bin/pggb -i ~/PGGB/wfmash/hprc/parts/chr1.pan.fa -o chr1.pan -t 32 -p 98 -s 100000 -n 96 -k 311 -O 0.03 -T 16 -v -S -V CHM13:#,GRCh38:# -Z
PARAMETERS
general:
input-fasta: ~/PGGB/wfmash/hprc/parts/chr1.pan.fa
output-dir: chr1.pan
temp-dir: chr1.pan
resume: false
compress: true
threads: 32
poa_threads: 16
wfmash:
version: v0.10.0-9-gcb0ce95
segment-length: 100000
block-length: 25000
map-pct-id: 98
n-mappings: 96
no-splits: false
sparse-map: false
mash-kmer: 19
mash-kmer-thres: 0.001
exclude-delim: false
no-merge-segments: false
seqwish:
version: v0.7.7-2-gf362f6f
min-match-len: 311
sparse-factor: 0
transclose-batch: 10000000
smoothxg:
version: v0.6.7-30-g3b3c2c3
skip-normalization: false
n-haps: 96
path-jump-max: 0
edge-jump-max: 0
poa-length-target: 700,900,1100
poa-params: 1,19,39,3,81,1
poa_padding: 0.03
run_abpoa: false
run_global_poa: false
pad-max-depth: 100
write-maf: false
consensus-spec: false
consensus-prefix: Consensus_
block-id-min: .9800
block-ratio-min: 0
odgi:
version: v0.7.3
viz: false
layout: false
stats: true
gfaffix:
version: v0.1.4
reduce-redundancy: true
vg:
version: v1.44.0
deconstruct: CHM13:#,GRCh38:#
reporting:
version: v1.13
multiqc: false
@zhqduan how much RAM do you have? It might be an Out-Of-Memory problem.
(do not use vg 1.44.0
because of https://github.com/vgteam/vg/issues/3807, but keep using vg 1.40.0
).
Hi @AndreaGuarracino,
I submit my job with slurm "#SBATCH --mem=240g". The max memory I can use is 3Tb.
OK, I will use vg 1.40.0.
Thank you very much!
@zhqduan this issue has been addressed in #https://github.com/pangenome/smoothxg/pull/183 and https://github.com/pangenome/pggb/pull/280. If you update smoothxg
by building it from the current GitHub master and update pggb
's script, you will see a strong performance improvement.
Feel free to reopen the issue if the execution time is still too long.
Great! I will rerun my script after updating the smootxg. Thank you very much!
Hello,
I am constructing a human pan genome by pggb from more than 50 haplotyped-resloved genomes according to the HPRC pipeline . My command is
pggb -i chr"$i".pan.fa -o chr"$i".pan -t 64 -p 98 -s 100000 -n 36 -k 311 -O 0.03 -T 16 -v -S -V CHM13:#,GRCh38:# -Z
Other chromosomes worked well except three chromosomes (chr1, chr16 and chrY). About one month has gone, but pggb is still run:Do you have any suggestions about this issue? Thank you very much!