pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
368 stars 41 forks source link

Sample command from README.md fails and fix? #219

Closed colindaven closed 2 years ago

colindaven commented 2 years ago
./pggb -i data/HLA/DRB1-3123.fa.gz -p 70 -s 3000 -G 2000 -n 10 -t 16 -v -V 'gi|568815561:#' -o out -M
[wfmash::map] Reference = [data/HLA/DRB1-3123.fa.gz]
[wfmash::map] Query = [data/HLA/DRB1-3123.fa.gz]
[wfmash::map] Kmer size = 19
[wfmash::map] Window size = 19
[wfmash::map] Segment length = 3000 (read split allowed)
[wfmash::map] Block length min = 15000
[wfmash::map] Chaining gap max = 300000
[wfmash::map] Percentage identity threshold = 70%
[wfmash::map] Skip self mappings
[wfmash::map] Mapping output file = out/wfmash-ZWwfMe
[wfmash::map] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[wfmash::map] Execution threads  = 16
[wfmash::skch::Sketch::build] minimizers picked from reference = 17592
[wfmash::skch::Sketch::index] unique minimizers = 4432
[wfmash::skch::Sketch::computeFreqHist] Frequency histogram of minimizers = (1, 34) ... (22, 2)
[wfmash::skch::Sketch::computeFreqHist] With threshold 0.5%, ignore minimizers occurring >= 13 times during lookup.
[wfmash::map] time spent computing the reference index: 0.0216735 sec
[wfmash::skch::Map::mapQuery] mapped 100.00% @ 3.26e+05 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[wfmash::skch::Map::mapQuery] count of mapped reads = 11, reads qualified for mapping = 12, total input reads = 12, total input bp = 163416
[wfmash::map] time spent mapping the query: 5.30e-01 sec
[wfmash::map] mapping results saved in: out/wfmash-ZWwfMe
[wfmash::align] Reference = [data/HLA/DRB1-3123.fa.gz]
[wfmash::align] Query = [data/HLA/DRB1-3123.fa.gz]
[wfmash::align] Mapping file = out/wfmash-ZWwfMe
[wfmash::align] Alignment identity cutoff = 5.60e-01%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 6.12e-02 sec
[wfmash::align::computeAlignments] aligned 100.00% @ 3.21e+05 bp/s elapsed: 00:00:00:02 remain: 00:00:00:00
[wfmash::align::computeAlignments] count of mapped reads = 12, total aligned bp = 641375   
[wfmash::align] time spent computing the alignment: 2.06e+00 sec
[wfmash::align] alignment results saved in: /dev/stdout
wfmash -X -s 3000 -p 70 -n 9 -B out -t 16 data/HLA/DRB1-3123.fa.gz data/HLA/DRB1-3123.fa.gz
23.15s user 1.12s system 919% cpu 2.64s total 165072Kb max memory
Flag could not be matched: temp-dir

Removing -o out then works as far as I can tell.... (correct? )

./pggb -i data/HLA/DRB1-3123.fa.gz -p 70 -s 3000 -G 2000 -n 10 -t 16 -v -V 'gi|568815561:#' -M
[wfmash::map] Reference = [data/HLA/DRB1-3123.fa.gz]
[wfmash::map] Query = [data/HLA/DRB1-3123.fa.gz]
[wfmash::map] Kmer size = 19
[wfmash::map] Window size = 19
[wfmash::map] Segment length = 3000 (read split allowed)
[wfmash::map] Block length min = 15000
[wfmash::map] Chaining gap max = 300000
[wfmash::map] Percentage identity threshold = 70%
[wfmash::map] Skip self mappings
[wfmash::map] Mapping output file = xxxxxxxxxxxxxxxxxxxdev/pggb_github/pggb/wfmash-vo6erM
[wfmash::map] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[wfmash::map] Execution threads  = 16
[wfmash::skch::Sketch::build] minimizers picked from reference = 17592
[wfmash::skch::Sketch::index] unique minimizers = 4432
[wfmash::skch::Sketch::computeFreqHist] Frequency histogram of minimizers = (1, 34) ... (22, 2)
[wfmash::skch::Sketch::computeFreqHist] With threshold 0.5%, ignore minimizers occurring >= 13 times during lookup.
[wfmash::map] time spent computing the reference index: 0.0126743 sec
[wfmash::skch::Map::mapQuery] mapped 100.00% @ 3.26e+05 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[wfmash::skch::Map::mapQuery] count of mapped reads = 11, reads qualified for mapping = 12, total input reads = 12, total input bp = 163416
[wfmash::map] time spent mapping the query: 5.05e-01 sec
[wfmash::map] mapping results saved in: /mnt/beegfs/scratch/bioinformatics/colin/dev/pggb_github/pggb/wfmash-vo6erM
[wfmash::align] Reference = [data/HLA/DRB1-3123.fa.gz]
[wfmash::align] Query = [data/HLA/DRB1-3123.fa.gz]
[wfmash::align] Mapping file = /mnt/beegfs/scratch/bioinformatics/colin/dev/pggb_github/pggb/wfmash-vo6erM
[wfmash::align] Alignment identity cutoff = 5.60e-01%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 4.94e-02 sec
[wfmash::align::computeAlignments] aligned 100.00% @ 3.21e+05 bp/s elapsed: 00:00:00:02 remain: 00:00:00:00
[wfmash::align::computeAlignments] count of mapped reads = 12, total aligned bp = 641375
[wfmash::align] time spent computing the alignment: 2.05e+00 sec
[wfmash::align] alignment results saved in: /dev/stdout
wfmash -X -s 3000 -p 70 -n 9 -t 16 data/HLA/DRB1-3123.fa.gz data/HLA/DRB1-3123.fa.gz
24.07s user 1.15s system 972% cpu 2.59s total 163460Kb max memory
[seqwish::seqidx] 0.001 indexing sequences
[seqwish::seqidx] 0.061 index built
[seqwish::alignments] 0.061 processing alignments
[seqwish::alignments] 0.071 indexing
[seqwish::alignments] 0.102 index built
[seqwish::transclosure] 0.108 computing transitive closures
[seqwish::transclosure] 0.115 0.00% 0-163416 overlap_collect
[seqwish::transclosure] 0.183 0.00% 0-163416 rank_build
[seqwish::transclosure] 0.186 0.00% 0-163416 parallel_union_find
[seqwish::transclosure] 0.207 0.00% 0-163416 dset_write
[seqwish::transclosure] 0.209 0.00% 0-163416 dset_compression
[seqwish::transclosure] 0.217 0.00% 0-163416 dset_sort
[seqwish::transclosure] 0.223 0.00% 0-163416 dset_invert
[seqwish::transclosure] 0.231 0.00% 0-163416 graph_emission
[seqwish::transclosure] 0.274 100.00% building node_iitree and path_iitree indexes
[seqwish::transclosure] 0.295 100.00% done
[seqwish::transclosure] 0.295 done with transitive closures
[seqwish::compact] 0.295 compacting nodes
[seqwish::compact] 0.298 done compacting
[seqwish::compact] 0.298 built node index
[seqwish::links] 0.298 finding graph links
[seqwish::links] 0.317 links derived
[seqwish::gfa] 0.317 writing graph
[seqwish::gfa] 0.336 done
seqwish -t 16 -s data/HLA/DRB1-3123.fa.gz -p data/HLA/DRB1-3123.fa.gz.e4d7759.wfmash.paf -k 19 -f 0 -g data/HLA/DRB1-3123.fa.gz.e4d7759.417fcdf.seqwish.gfa -B 10000000 -P
0.26s user 0.25s system 146% cpu 0.35s total 38668Kb max memory
[smoothxg::main] loading graph
[smoothxg::main] prepping graph for smoothing
[odgi::gfa_to_handle] building nodes: 100.00% @ 7.65e+03/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::gfa_to_handle] building edges: 100.00% @ 1.43e+04/s elapsed: 00:00:00:00 remain: 00:00:00:00
Command terminated by signal 4
smoothxg -t 16 -T 16 -g data/HLA/DRB1-3123.fa.gz.e4d7759.417fcdf.seqwish.gfa -w 20000 -X 100 -I .7000 -R 0 -j 0 -e 0 -l 2000 -P 1,19,39,3,81,1 -O 0.001 -Y 1000 -d 0 -D 0 -S -m data/HLA/DRB1-3123.fa.gz.e4d7759.417fcdf.20e3026.smooth.maf -Q Consensus_ -V -o data/HLA/DRB1-3123.fa.gz.e4d7759.417fcdf.20e3026.smooth.gfa
0.04s user 0.01s system 9% cpu 0.68s total 9868Kb max memory
AndreaGuarracino commented 2 years ago

@colindaven, could you please check the versions of all pggb's tools? Be sure everything is updated to the latest version.

colindaven commented 2 years ago

Yes, this was the docker container. I've narrowed down the issue and will report separately.