Open GeorgeBGM opened 1 year ago
Hi @George-du,
there are several possibilities:
I would recommend the first option, though I am aware of the computational overhead.
Thank you for your reply, I will take your suggestion and feel that adding the new function of PGGB to add new samples will be very helpful and useful.
A future option would be to only generate alignments that would be induced by the addition of the new samples. This would be helpful because most of the runtime is dependent upon the quadratic, all2all alignment.
On Tue, Jun 20, 2023, 05:23 George-du @.***> wrote:
Thank you for your reply, I will take your suggestion and feel that adding the new function of PGGB to add new samples will be very helpful and useful.
— Reply to this email directly, view it on GitHub https://github.com/pangenome/pggb/issues/306#issuecomment-1598053963, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEM2Z7HU4VGK6G3KGELXMEJSDANCNFSM6AAAAAAZLA3VQ4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Sounds great, thanks so much for your help.
Hi, I am using the PGGB process to split chromosomes to build a pan-genome and the smoothxg software is generating errors on some of the chromosomes. do you have some suggestions about these reported errors?Thanks!
Software : Smoothxg(v0.6.8-0-ga8a0e9e) Error1: smoothxg -t 30 -T 30 -g ./graphs/chrY.pan/chrY.pan.new.fa.gz.bf8016f.04f1c29.seqwish.gfa -r 114 --base ./graphs/chrY.pan --chop-to 100 -I .9800 -R 0 -j 0 -e 0 -l 700,900,1100 -P 1,19,39,3,81,1 -O 0.03 -Y 11400 -d 0 -D 0 -S -Q Consensus_ -V -o ./graphs/chrY.pan/chrY.pan.new.fa.gz.bf8016f.04f1c29.5ef21f9.smooth.gfa 259730.80s user 26577.66s system 82% cpu 345179.20s total 53544752Kb max memory
Error2: smoothxg -t 30 -T 30 -g ./graphs/chr9.pan/chr9.pan.new.fa.gz.2ca993e.04f1c29.seqwish.gfa -r 236 --base ./graphs/chr9.pan --chop-to 100 -I .9800 -R 0 -j 0 -e 0 -l 700,900,1100 -P 1,19,39,3,81,1 -O 0.03 -Y 23600 -d 0 -D 0 -S -Q Consensus_ -V -o ./graphs/chr9.pan/chr9.pan.new.fa.gz.2ca993e.04f1c29.03ca4fb.smooth.gfa
Hi, I'm curious if I described the problem clearly and if there are some suggestions about the solution to this problem?
Hi, developers! What should I do to avoid the above reported error? Should I re-run the smoothxg program without the -Q Consensus_ parameter and the -O 0, or do I need to reduce my mash length from 50kb to 10kb (Reference: https://github.com/pangenome/pggb/issues/182). Are there some other suggestions? Besides, do these two strategies have a significant impact on the final result?
Hi, @subwaystation @ekg ,
I tried the above strategy on human chromosome 13, but the smoothxg step is still giving errors at the moment. Are there any some suggestions about this problem or can I just use the results before the smoothxg step?
1) re-run the smoothxg program without the -Q Consensus_ parameter and the -O 0:
the command: smoothxg -t 30 -T 30 -g ./graphs/chr13.pan/chr13*seqwish.gfa -r 236 --base ./graphs/chr13.pan --chop-to 100 -I .9800 -R 0 -j 0 -e 0 -l 700,900,1100 -P 1,19,39,3,81,1 -O 0 -Y 23 -D 0 -o ./g13.pan/chr13.pan/chr13.pan.fa.gz.2ca993e.04f1c29.03ca4fb.smooth.gfa
the error message:
[smoothxg::(1-3)::smooth_and_lace] embedding 79826 path fragments: 0.01% @ 2.81e+04/s elapsed: 00:00:00:00 remain: 00:00:00:02smoothxg: /opt/conda/conda-bld/smoothxg_1671059618733/work/src/smooth.cpp:2117: odgi::graph_t smoothxg::smooth_and_lace(const xg::XG&, smoothxg::blockset_t&, int, int, int, int, int, int, const bool&, const uint64_t&, float, uint64_t, bool, int, int, const string&, std::string&, bool, bool, double, bool, const string&, std::vector<std::__cxx11::basic_string
2) reduce my mash length from 50kb to 10kb:
the command: $RUN_PGGB -r -i /home/u20111010010/Project/Pan-genome/002.Merge_Pan_V2/Merge-V1/001.Sequence_partitioning/parts/chr$i.pan.new.fa.gz -o ./graphs/new_chr$i.pan -t 30 -p 98 -s 10000 -n 236 -k 311 -O 0.03 -T 30
the error message: [smoothxg::(1-3)::break_and_split_blocks] cutting and splitting 869849 blocks: 100.00% @ 4.33e+04/s elapsed: 00:00:00:20 remain: 00:00:00:00smoothxg: /opt/conda/conda-bld/smoothxg_1671059618733/work/build/sdsl-lite-prefix/src/sdsl-lite-build/include/sdsl/enc_vector.hpp:193: sdsl::enc_vector<t_coder, t_dens, t_width>::value_type sdsl::enc_vector<t_coder, t_dens, t_width>::operator[](sdsl::enc_vector<t_coder, t_dens, t_width>::size_type) const [with t_coder = sdsl::coder::elias_delta; unsigned int t_dens = 128; unsigned char t_width = 0; sdsl::enc_vector<t_coder, t_dens, t_width>::value_type = long unsigned int; sdsl::enc_vector<t_coder, t_dens, t_width>::size_type = long unsigned int]: Assertion `i < m_size' failed. Command terminated by signal 6
I'm looking forward to your reply. Best,Du
Can you try the same command lines, but installing PGGB via Docker/Singularity?
Hi, developers!
I will try to install PGGB via Docker/Singularity, Do I need to install a specific version?
The latest version available, thanks!
Got that. I'll try it again.
Hi, @subwaystation @ekg @AndreaGuarracino,
I installed the latest PGGB (pggb 8eaf354) using Singularity with non-root privileges, but still get a similar error. The details of the reported error are as follows:
1.re-run the smoothxg program without the -Q Consensus_ parameter and the -O 0:(mash length: 50kb/10kb)
10kb RUN_PGGB="singularity exec /home/Software/pggb/pggb.simg pggb" $RUN_PGGB -r -i chr13.pan.new.fa.gz -o new_chr13.pan -t 45 -p 98 -s 10000 -n 236 -k 311 -O 0.03 -T 45
50kb singularity exec /home/Software/pggb/pggb.simg smoothxg -t 30 -T 30 -g chr13*seqwish.gfa -r 236 --base ./graphs/chr13.pan --chop-to 100 -I .9800 -R 0 -j 0 -e 0 -l 700,900,1100 -P 1,19,39,3,81,1 -O 0 -Y 23 -D 0 -o ./graphs/chr13.pan/chr13.pan.fa.gz.2ca993e.04f1c29.03ca4fb.smooth.gfa"
10kb
e+04 bp/s elapsed: 00:00:00:14 remain: 00:00:00:00
^M[smoothxg::(2-3)::smooth_and_lace] embedding 395114099 path fragments: 0.00% @ 0.00e+00 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00^M[smoothxg::(2-3)::smooth_and_lace] embedding 395114099 path fragments: 0.00% @ 2.25e+04 bp/s elapsed: 00:00:00:00 remain: 00:04:52:01smoothxg: /smoothxg/src/smooth.cpp:2551: odgi::graph_t smoothxg::smooth_and_lace(const xg::XG&, smoothxg::blockset_t&, int, int, int, int, int, int, const bool&, const uint64_t&, float, uint64_t, bool, int, int, const string&, std::string&, bool, bool, double, bool, const string&, std::vector<std::__cxx11::basic_string
50kb
02 remain: 00:00:00:04^M[smoothxg::(1-3)::smooth_and_lace] adding edges from 992731 graphs: 100.00% @ 3.97e+05 bp/s elapsed: 00:00:00:02 remain: 00:00:00:00
^M[smoothxg::(1-3)::smooth_and_lace] embedding 76537735 path fragments: 0.00% @ 0.00e+00 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00^M[smoothxg::(1-3)::smooth_and_lace] embedding 76537735 path fragments: 0.00% @ 1.97e+04 bp/s elapsed: 00:00:00:00 remain: 00:01:04:52smoothxg: /smoothxg/src/smooth.cpp:2551: odgi::graph_t smoothxg::smooth_and_lace(const xg::XG&, smoothxg::blockset_t&, int, int, int, int, int, int, const bool&, const uint64_t&, float, uint64_t, bool, int, int, const string&, std::string&, bool, bool, double, bool, const string&, std::vector<std::__cxx11::basic_string
2. re-run the PGGB pipeline using Singularity:
RUN_PGGB="singularity exec /home/Software/pggb/pggb.simg pggb" $RUN_PGGB -r -i chr13.pan.new.fa.gz -o ./graphs/rerun-new_chr13.pan -t 45 -p 98 -s 10000 -n 236 -k 311 -O 0.03 -T 45
[wfmash::skch::Map::mapQuery] count of mapped reads = 13369, reads qualified for mapping = 13641, total input reads = 13641, total input bp = 24623027601 [wfmash::map] time spent mapping the query: 3.71e+03 sec [wfmash::map] mapping results saved in: /dev/stdout wfmash -s 10000 -l 50000 -p 98 -n 235 -k 19 -H 0.001 -X -t 45 --tmp-base ./graphs/rerun-new_chr13.pan chr13.pan.new.fa.gz --approx-map 126560.51s user 7903.55s system 3462% cpu 3883.39s total 20414144Kb max memory /usr/local/bin/pggb: line 497: /dev/fd/63: No such file or directory INFO: Cleaning up image...
Do you have any suggestions for these reported errors? Thanks!
I'm looking forward to your reply. Best,Du
It looks like two different issues.
If you re run do you ever get the exact same error in smooth and lace?
@ekg @subwaystation @AndreaGuarracino
Hi, developers!
The second attempt is the result of running the PGGB process completely from scratch using the Singularity image (non-root install), which produces an error after the wfmash step , so it could not run to the smoothxg step.
The first attempt was based on the output of the Linux installation version (the smoothxg step was incorrect), and then this step was re-executed using the smoothxg software in Singularity Images.
It really is two different issue. Thanks in advance!
Hi @George-du, would it be possible to share your input data or a tiny subset of it, which produces the issues? Thanks!
Dear @subwaystation @AndreaGuarracino,
Here is the raw data I used for the above pipeline, please help me check the exact errors. Thanks! (https://sandbox.zenodo.org/record/1234413)
Dear @subwaystation @AndreaGuarracino,
Here is the raw data I used in the above pipeline, please help me check the exact error. Thanks!
(https://sandbox.zenodo.org/record/1234413)
At 2023-08-17 20:58:03, "Simon Heumos" @.***> wrote:
Hi @George-du, would it be possible to share your input data or a tiny subset of it, which produces the issues? Thanks!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Dear @subwaystation @AndreaGuarracino,
Can the data be downloaded and used properly?
@George-du, thank you for the data. I am running pggb
with it on our cluster, installed by building each tool from GitHub source (so no Docker/Singularity).
pggb -i chr13.pan.new.fa.gz -p 98 -s 50000 -n 236 -k 311 -t 48 -o xxx -D /scratch
It is taking a while. At the moment it is at the 2nd round of SPOA, without issues.
Dear @AndreaGuarracino @subwaystation,
Wow, that sounds good. The version of the software I'm using in the PGGB pipeline is as follows. Additionally, I found that some of the chromosome Smoothxg steps were taking an extraordinarily long time to run and ended up generating errors (chr15 ; ~1 month ; Command terminated by signal 7 ) . The detail is as follows:
The software version of PGGB pipeline: Wfmash : v0.10.3-3-g8ba3c53 Seqwish : v0.7.9-0-gd9e7ab5 Smoothxg : v0.6.8-0-ga8a0e9e Odgi : v0.8.2-0-g8715c55
The commands and results are as follows (chr15 ; ~1 month ; Command terminated by signal 7) : RUN_PGGB=“/home/Software/Anaconda/mambaforge-pypy3/envs/pggb/bin/pggb” sbatch -p tissue --job-name=chr15 --mem=300G -c 30 -o ./log/001.test-pggb-graph-chr15.out --wrap "$RUN_PGGB -r -i /home/Project/Pan-genome/002.Merge_Pan_V2/Merge-V1/001.Sequence_partitioning/parts/chr15.pan.new.fa.gz -o ./graphs/chr15.pan -t 30 -p 98 -s 50000 -n 236 -k 311 -O 0.03 -T 30"
Looking forward to the resolution of this issue. Thanks in advance.
Is there a possibility for you @George-du to run our latest Docker image? You have quite a lot of data as input ^^ Maybe you ran out of disk space?
Dear @subwaystation,
I will contact the administrator and try to run the latest Docker image. Thanks.
@George-du, I was able to finish PGGB. It seems the problem is specific to your installation and/or cluster.
I've used
general:
input-fasta: /lizardfs/guarracino/bug_smoothxg/chr13.pan.new.fa.gz
output-dir: /lizardfs/guarracino/bug_smoothxg/xxx
temp-dir: /scratch
resume: false
compress: false
threads: 48
poa_threads: 48
wfmash:
version: v0.10.4-7-g0981b92
segment-length: 50000
block-length: 250000
map-pct-id: 98
n-mappings: 236
no-splits: false
sparse-map: false
mash-kmer: 19
mash-kmer-thres: 0.001
exclude-delim: false
no-merge-segments: false
seqwish:
version: v0.7.9-2-gf44b402
min-match-len: 311
sparse-factor: 0
transclose-batch: 10000000
smoothxg:
version: v0.7.0-18-g4ff4cf2
skip-normalization: false
n-haps: 236
path-jump-max: 0
edge-jump-max: 0
poa-length-target: 700,900,1100
poa-params: 1,19,39,3,81,1
poa_padding: 0.001
run_abpoa: false
run_global_poa: false
pad-max-depth: 100
write-maf: false
consensus-spec: false
consensus-prefix: Consensus_
block-id-min: .9800
block-ratio-min: 0
odgi:
version: v0.8.3-26-gbc7742ed
viz: true
layout: true
stats: false
gfaffix:
version: v0.1.5
reduce-redundancy: true
vg:
version: v1.50.1
deconstruct: false
Wow, I'll reinstall the latest version of PGGB and test it out.
Hi, How to add new sample genomes and contigs to an existing pan-genome producted by PGGB, and whether it can be done directly using the Minigraph or GraphAligner tool. Any suggestions on how to do this.