vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

I confused the workflow how vg works for whole genome sequencing. #2925

Open a00101 opened 4 years ago

a00101 commented 4 years ago

I looked at the wiki but didn't understand it well. Because it was different if the case is WGS from WIKI. So please advise. I come up with the workflow as below for my aim at calling variant from patient-specific vcf data.

1) Construct

2) Convert

3) Prune

4) Index

5) Map

6) Augment

7) Variation call

8) If i want to graph for specified patient, construct graph from final.vcf. and then process from 1 to 7.

ekg commented 4 years ago

I think you basically understand it. But, it's not a packed graph that you make with vg convert. Rather, you make an XG index. That's got positional indexes needed by vg map.

On Sun, Jul 26, 2020, 16:25 a00101 notifications@github.com wrote:

I looked at the wiki but didn't understand it well. Because it was different if the case is WGS from WIKI. So please advise. I come up with the workflow as below for my aim at calling variant from patient-specific vcf data.

  1. Construct

    • Graph construction

      vg construct -r hg38.fa -v dbsnp.vcf.gz > first.vg

  2. Convert

    • Converting graph to packed-graph for efficiency

      vg convert -x first.vg > first.xg

  3. Prune

    • Purning the graph before indexing, because of .... what does 'prune' means precisely and simply?

      vg prune -r first.vg > second.vg

  4. Index

    • Graph indexing

      vg index -g third.gcsa second.vg

  5. Map

    • Mapping to graph index

      vg map -x first.xg -g third.gcsa -f test_trimmed_1.fq.gz -f test_trimmed_2.fq.gz > fourth.gam

  6. Augment

    • Add information of variation into graph

      vg augment second.vg fourth.gam -A fifth.gam > sixth.vg

  7. Variation call

    • Filtering out less 5 for mapping quality and calling

vg pack -x first.xg -g fifth.gam -Q 5 -o seventh.pack vg call -x first.xg -k seventh.pack > final.vcf

  1. If i want to graph for specified patient, construct graph from final.vcf. and then process from 1 to 7.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2925, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEIUQWT25GLML2VJCUTR5Q4EJANCNFSM4PIAECPA .

a00101 commented 4 years ago

@ekg Thank you for your answer. But I got error

# vg augment -p second.vg fourth.gam > sixth.vg

Reading input graph
terminate called after throwing an instance of 'std::runtime_error'
  what():  Node from GAM "21433" not found in graph.  If you are sure the input graph is a subgraph of that used to create the GAM, you can ignore this error with "vg augment -s"
ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_aXYJj3/stacktrace.txt
Please include the stack trace file in your bug report!
a00101 commented 4 years ago

At what stage do you use the vg file, the output file from the augment stage?