pangenome / pggb

the pangenome graph builder
https://doi.org/10.1101/2023.04.05.535718
MIT License
357 stars 38 forks source link

Example command issue? #140

Closed ekimb closed 2 years ago

ekimb commented 2 years ago

Hey guys,

I'm running into an issue trying to execute the example command:

git clone --recursive https://github.com/pangenome/pggb
cd pggb
./pggb -i data/HLA/DRB1-3123.fa.gz -N -s 5000 -I 0 -p 80 -n 10 -k 8 -t 16 -v -L -o out

which gives the usage info:

Mandatory arguments -i, -s, -n, -p
usage: ./pggb -i <input-fasta> -s <segment-length> -n <n-mappings>
              -p <map-pct-id> [options]
options:
   [wfmash]
    -i, --input-fasta FILE      input FASTA/FASTQ file
    -s, --segment-length N      segment length for mapping
    -l, --block-length N        minimum block length filter for mapping [default: 3*segment-length]
    -N, --no-split              disable splitting of input sequences during mapping [enabled by default]
    -M, --no-merge-segments     do not merge successive mappings
    -p, --map-pct-id PCT        percent identity in the wfmash step
    -n, --n-mappings N          number of mappings to retain for each segment
    -K, --mash-kmer N           kmer size for mashmap [default: 16]
    -Y, --exclude-delim C       skip mappings between sequences with the same name prefix before
                                the given delimiter character [default: all-vs-all and !self]
   [seqwish]
    -k, --min-match-len N       ignore exact matches below this length [default: 19]
    -B, --transclose-batch      number of bp to use for transitive closure batch [default: 10000000]
   [smoothxg]
    -H, --n-haps N              number of haplotypes, if different than that set with -n [default: n-mappings]
    -d, --split-min-depth N     minimum POA block depth to trigger splitting [default: 2000]
    -I, --block-id-min N        split blocks into groups connected by this identity threshold [default: 0.0]
    -R, --block-ratio-min N     minimum small / large length ratio to cluster in a block [default: 0.0]
    -j, --path-jump-max         maximum path jump to include in block [default: 100]
    -e, --edge-jump-max N       maximum edge jump before breaking [default: 0 / off]
    -G, --poa-length-target N,M target sequence length for POA, first pass = N, second pass = M [default: 13117,13219]
    -P, --poa-params PARAMS     score parameters for POA in the form of match,mismatch,gap1,ext1,gap2,ext2
                                [default: 1,19,39,3,81,1]
    -O, --poa-padding N         pad each end of each sequence in POA with N*(longest_poa_seq) bp [default: 0.03]
    -F, --write-maf             write MAF output representing merged POA blocks [default: off]
    -Q, --consensus-prefix P    use this prefix for consensus path names [default: Consensus_]
    -C, --consensus-spec SPEC   consensus graph specification: write consensus graphs to
                                BASENAME.cons_[spec].gfa; where each spec contains at least a min_len parameter
                                (which defines the length of divergences from consensus paths to preserve in the
                                output), optionally a file containing reference paths to preserve in the output,
                                a flag (y/n) indicating whether we should also use the POA consensus paths, a
                                minimum coverage of consensus paths to retain (min_cov), and a maximum allele
                                length (max_len, defaults to 1e6); implies -a; example:
                                cons,100,1000:refs1.txt:n,1000:refs2.txt:y:2.3:1000000,10000
                                [default: off]
   [odgi]
    -v, --viz                   render a visualization of the graph in 1D [default: off]
    -L, --layout                render a 2D layout of the graph [default: off]
    -S, --stats                 generate statistics of the seqwish and smoothxg graph [default: off]
   [gfaffix]
    -U, --normalize             normalize and re-sort the output graph [default: off]
   [vg]
    -V, --vcf-spec SPEC         specify a set of VCFs to produce with SPEC = [REF:SAMPLE_LIST_FILE,]*
                                the paths matching ^REF are used as a reference, while the samples are taken
                                one per line from the given SAMPLE_LIST_FILE (e.g. -V chm13:sample.list,grch38:sample.list)
   [multiqc]
    -m, --multiqc               generate MultiQC report of graphs' statistics and visualizations,
                                automatically runs odgi stats [default: off]
   [general]
    -o, --output-dir PATH       output directory
    -r, --resume PATH           do not overwrite existing output from wfmash, seqwish, smoothxg in given directory
                                [default: start pipeline from scratch in a new directory]
    -t, --threads N             number of compute threads to use in parallel steps
    -T, --poa-threads N         number of compute threads to use during POA (set lower if you OOM during smoothing)
    -Z, --pigz-compress         compress alignment (.paf), graph (.gfa, .og), and MSA (.maf) outputs with pigz
    -h, --help                  this text

Use wfmash, seqwish, smoothxg, and odgi to build and display a pangenome graph.

What might I be doing wrong here? Thanks in advance!

ekg commented 2 years ago

To install, you'll need to have all the scripts that pggb calls in your path. The Dockerfile describes the process. There is also a guix package, if you can use that, which is preferred.

On Tue, Oct 19, 2021, 16:22 Barış Ekim @.***> wrote:

Hey guys,

I'm running into an issue trying to execute the example command:

git clone --recursive https://github.com/pangenome/pggb cd pggb ./pggb -i data/HLA/DRB1-3123.fa.gz -N -s 5000 -I 0 -p 80 -n 10 -k 8 -t 16 -v -L -o out

which gives the usage info:

Mandatory arguments -i, -s, -n, -p usage: ./pggb -i -s -n -p [options] options: [wfmash] -i, --input-fasta FILE input FASTA/FASTQ file -s, --segment-length N segment length for mapping -l, --block-length N minimum block length filter for mapping [default: 3segment-length] -N, --no-split disable splitting of input sequences during mapping [enabled by default] -M, --no-merge-segments do not merge successive mappings -p, --map-pct-id PCT percent identity in the wfmash step -n, --n-mappings N number of mappings to retain for each segment -K, --mash-kmer N kmer size for mashmap [default: 16] -Y, --exclude-delim C skip mappings between sequences with the same name prefix before the given delimiter character [default: all-vs-all and !self] [seqwish] -k, --min-match-len N ignore exact matches below this length [default: 19] -B, --transclose-batch number of bp to use for transitive closure batch [default: 10000000] [smoothxg] -H, --n-haps N number of haplotypes, if different than that set with -n [default: n-mappings] -d, --split-min-depth N minimum POA block depth to trigger splitting [default: 2000] -I, --block-id-min N split blocks into groups connected by this identity threshold [default: 0.0] -R, --block-ratio-min N minimum small / large length ratio to cluster in a block [default: 0.0] -j, --path-jump-max maximum path jump to include in block [default: 100] -e, --edge-jump-max N maximum edge jump before breaking [default: 0 / off] -G, --poa-length-target N,M target sequence length for POA, first pass = N, second pass = M [default: 13117,13219] -P, --poa-params PARAMS score parameters for POA in the form of match,mismatch,gap1,ext1,gap2,ext2 [default: 1,19,39,3,81,1] -O, --poa-padding N pad each end of each sequence in POA with N(longest_poaseq) bp [default: 0.03] -F, --write-maf write MAF output representing merged POA blocks [default: off] -Q, --consensus-prefix P use this prefix for consensus path names [default: Consensus] -C, --consensus-spec SPEC consensus graph specification: write consensus graphs to BASENAME.cons_[spec].gfa; where each spec contains at least a min_len parameter (which defines the length of divergences from consensus paths to preserve in the output), optionally a file containing reference paths to preserve in the output, a flag (y/n) indicating whether we should also use the POA consensus paths, a minimum coverage of consensus paths to retain (min_cov), and a maximum allele length (max_len, defaults to 1e6); implies -a; example: cons,100,1000:refs1.txt:n,1000:refs2.txt:y:2.3:1000000,10000 [default: off] [odgi] -v, --viz render a visualization of the graph in 1D [default: off] -L, --layout render a 2D layout of the graph [default: off] -S, --stats generate statistics of the seqwish and smoothxg graph [default: off] [gfaffix] -U, --normalize normalize and re-sort the output graph [default: off] [vg] -V, --vcf-spec SPEC specify a set of VCFs to produce with SPEC = [REF:SAMPLE_LIST_FILE,]* the paths matching ^REF are used as a reference, while the samples are taken one per line from the given SAMPLE_LIST_FILE (e.g. -V chm13:sample.list,grch38:sample.list) [multiqc] -m, --multiqc generate MultiQC report of graphs' statistics and visualizations, automatically runs odgi stats [default: off] [general] -o, --output-dir PATH output directory -r, --resume PATH do not overwrite existing output from wfmash, seqwish, smoothxg in given directory [default: start pipeline from scratch in a new directory] -t, --threads N number of compute threads to use in parallel steps -T, --poa-threads N number of compute threads to use during POA (set lower if you OOM during smoothing) -Z, --pigz-compress compress alignment (.paf), graph (.gfa, .og), and MSA (.maf) outputs with pigz -h, --help this text

Use wfmash, seqwish, smoothxg, and odgi to build and display a pangenome graph.

What might I be doing wrong here? Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pangenome/pggb/issues/140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEI4SSU4AKAB7EJIJ7LUHV5JXANCNFSM5GJJCAJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ekimb commented 2 years ago

Thanks!