vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.08k stars 191 forks source link

Missing '--alt-prefix' option in recent versions of vg deconstruct. #4218

Open markcharder opened 5 months ago

markcharder commented 5 months ago

1. What were you trying to do? Use the '--alt-prefix' option of vg deconstruct to produce a VCF with only calls between the reference and a subset of the samples.

2. What did you want to happen? To be able to use the option.

3. What actually happened? In more recent versions, the option does not exist. However, the help message for vg deconstruct suggests that it should exist. This is the help message from vg deconstruct:

usage: ./vg deconstruct [options] [-p|-P] <PATH> <GRAPH>
Outputs VCF records for Snarls present in a graph (relative to a chosen reference path).
options: 
    -p, --path NAME          A reference path to deconstruct against (multiple allowed).
    -P, --path-prefix NAME   All paths [excluding GBWT threads / non-reference GBZ paths] beginning with NAME used as reference (multiple allowed).
                             Other non-ref paths not considered as samples. 
    -r, --snarls FILE        Snarls file (from vg snarls) to avoid recomputing.
    -g, --gbwt FILE          only consider alt traversals that correspond to GBWT threads FILE (not needed for GBZ graph input).
    -T, --translation FILE   Node ID translation (as created by vg gbwt --translation) to apply to snarl names and AT fields in output
    -O, --gbz-translation    Use the ID translation from the input gbz to apply snarl names to snarl names and AT fields in output
    -e, --path-traversals    Only consider traversals that correspond to paths in the graph.
    -a, --all-snarls         Process all snarls, including nested snarls (by default only top-level snarls reported).
    -d, --ploidy N           Expected ploidy.  If more traversals found, they will be flagged as conflicts (default: 2)
    -c, --context-jaccard N  Set context mapping size used to disambiguate alleles at sites with multiple reference traversals (default: 10000).
    -u, --untangle-travs     Use context mapping to determine the reference-relative positions of each step in allele traversals (AP INFO field).
    -K, --keep-conflicted    Retain conflicted genotypes in output.
    -S, --strict-conflicts   Drop genotypes when we have more than one haplotype for any given phase (set by default when using GBWT input).
    -C, --contig-only-ref    Only use the CONTIG name (and not SAMPLE#CONTIG#HAPLOTYPE etc) for the reference if possible (ie there is only one reference sample).
    -t, --threads N          Use N threads
    -v, --verbose            Print some status messages

Where it says 'Other non-ref paths not considered as samples.' has no corresponding option, but presumably this should be '--alt-prefix' or something similar. Sorry if I am mistaken. If this option no longer exists, I was wondering why it was removed and if this is still possible using a different approach.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here: NA

5. What data and command can the vg dev team use to make the problem happen? NA

6. What does running vg version say?

version v1.54.0 "Parafada"