Closed VLoegler closed 2 years ago
There is a currently hidden option to make a consensus graph. It's enabled with -C.
It's hidden because it seems that it sometimes breaks and we haven't had time to debug it
You can use this by following the hidden help text in the pggb script.
Something ilke pggb -C cons,100
should give you what you're interested in. This will be an additional graph written to the output directory.
#echo " -C, --consensus-spec SPEC consensus graph specification: write consensus graphs to"
#echo " BASENAME.cons_[spec].gfa; where each spec contains at least a min_len parameter"
#echo " (which defines the length of divergences from consensus paths to preserve in the"
#echo " output), optionally a file containing reference paths to preserve in the output,"
#echo " a flag (y/n) indicating whether we should also use the POA consensus paths, a"
#echo " minimum coverage of consensus paths to retain (min_cov), and a maximum allele"
#echo " length (max_len, defaults to 1e6); implies -a; example:"
#echo " cons,100,1000:refs1.txt:n,1000:refs2.txt:y:2.3:1000000,10000"
#echo " [default: off]"
Sorry, this is hard-disabled. You'll need to apply this diff to use it.
diff --git a/pggb b/pggb
index a872aef..40c0050 100755
--- a/pggb
+++ b/pggb
@@ -52,7 +52,7 @@ fi
# read the options
cmd=$0" "$@
-TEMP=`getopt -o i:o:D:a:p:n:s:l:K:F:k:x:f:B:H:j:P:O:Me:t:T:vhASY:G:Q:d:I:R:NbrmZzV: --long input-fasta:,output-dir:,temp-dir:,input-paf:,map-pct-id:,n-mappings:,segment-length:,block-length-min:,mash-kmer:,mash-kmer-thres:,min-match-length:,sparse-map:,sparse-factor:,transclose-batch:,n-haps:,path-jump-max:,subpath-min:,edge-jump-max:,threads:,poa-threads:,skip-viz,do-layout,help,no-merge-segments,do-stats,exclude-delim:,poa-length-target:,poa-params:,poa-padding:,run-abpoa,global-poa,write-maf,consensus-spec:,consensus-prefix:,pad-max-depth:,block-id-min:,block-ratio-min:,no-splits,resume,keep-temp-files,multiqc,compress,vcf-spec: -n 'pggb' -- "$@"`
+TEMP=`getopt -o i:o:D:a:p:n:s:l:K:F:k:x:f:B:H:j:P:O:Me:t:T:vhASY:G:Q:C:d:I:R:NbrmZzV: --long input-fasta:,output-dir:,temp-dir:,input-paf:,map-pct-id:,n-mappings:,segment-length:,block-length-min:,mash-kmer:,mash-kmer-thres:,min-match-length:,sparse-map:,sparse-factor:,transclose-batch:,n-haps:,path-jump-max:,subpath-min:,edge-jump-max:,threads:,poa-threads:,skip-viz,do-layout,help,no-merge-segments,do-stats,exclude-delim:,poa-length-target:,poa-params:,poa-padding:,run-abpoa,global-poa,write-maf,consensus-spec:,consensus-prefix:,pad-max-depth:,block-id-min:,block-ratio-min:,no-splits,resume,keep-temp-files,multiqc,compress,vcf-spec: -n 'pggb' -- "$@"`
eval set -- "$TEMP"
# extract options and their arguments into variables.
@@ -84,7 +84,7 @@ while true ; do
-b|--run-abpoa) run_abpoa=true ; shift ;;
-z|--global-poa) run_global_poa=true ; shift ;;
-M|--write-maf) write_maf=true ; shift ;;
- #-C|--consensus-spec) consensus_spec=$2 ; shift 2 ;;
+ -C|--consensus-spec) consensus_spec=$2 ; shift 2 ;;
-Q|--consensus-prefix) consensus_prefix=$2 ; shift 2 ;;
-t|--threads) threads=$2 ; shift 2 ;;
-T|--poa-threads) poa_threads=$2 ; shift 2 ;;
I caution that this will still leave nodes that are as long as the POA target length, and there can be SNPs between them. A method that works directly on the PGGB graph output (generic GFA) would seem to be better. It would be amazing if someone developed a generic method to do this.
Thanks for the answer!
Is there a way to not take SNPs and small INDELs into account in the final graph ? I am looking for a way to have a Pangenome graph with only structural variants and translocations. Thanks!