vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

vg construct example #2384

Open dd2019d opened 5 years ago

dd2019d commented 5 years ago

hi,

this is not an issue. plz bear with me since i am new to this. in the example to construct, you use the ".vcf.gz" file. currently, i have the ".fna" file, but how or where do i get the corresponding ".vcf.gz" file?

thanks.

ekg commented 5 years ago

The vcf.gz file represents variation against a reference, usually in FASTA format (often written as .fna).

What is in your .fna file?

On Sun, Aug 11, 2019, 21:20 dd2019d notifications@github.com wrote:

hi,

this is not an issue. plz bear with me since i am new to this. in the example to construct, you use the ".vcf.gz" file. currently, i have the ".fna" file, but how or where do i get the corresponding ".vcf.gz" file?

thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2384?email_source=notifications&email_token=AABDQENBAZZL6IT4OPVW7SDQEBYCRA5CNFSM4IK4UJJ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HETFRVA, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEP73L43VTV2K5HQ5EDQEBYCRANCNFSM4IK4UJJQ .

dd2019d commented 5 years ago

First line says:

CM008287.1 Fusarium oxysporum f. sp. ridicis-cucumerinum strain Forc016 chromosome 1, whole genome shotgun sequence

This is a reference file named "GCA_001702695.2_ASM170269v2_genomic.fna". I thought I could use this file instead of ".fa" file shown in the example.

Also, do I need the the ".vcf" file? Can I just call

vg construct -r my.fna > graph.vg

ekg commented 5 years ago

Yes, you should be able to. But without variation information or multiple references there is not a benefit to using vg.

On Mon, Aug 12, 2019, 03:28 dd2019d notifications@github.com wrote:

First line says:

CM008287.1 Fusarium oxysporum f. sp. ridicis-cucumerinum strain Forc016 chromosome 1, whole genome shotgun sequence

This is a reference file named "GCA_001702695.2_ASM170269v2_genomic.fna". I thought I could use this file instead of ".fa" file shown in the example.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2384?email_source=notifications&email_token=AABDQELUPYNZNBAU4SOWD2DQEDDFHA5CNFSM4IK4UJJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4BORQY#issuecomment-520284355, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEM3UDHE46C52ZY2JWDQEDDFHANCNFSM4IK4UJJQ .

dd2019d commented 5 years ago

How or where do I obtain the .vcf file for my data? If not available, then is it OK to use two .fna files as input, e.g.,

vg construct -r one.fna -r two.fna > graph.vg

glennhickey commented 5 years ago

@ekg has a good overview of the various input types here: https://github.com/ekg/alignment-and-variant-calling-tutorial

maxineliu commented 2 years ago

I have the same question as dd2019d. I only have the reference genome of my research organism in .fa file and long-reads sequences of several samples in .bam file. I wonder how to obtain the VCF file? How to construct my pangenome?

What I think now is that first I need tools like samtool to convert .bam to .fasta, then construct a flat reference genome with vg construct , then do alignment - augumentation - varients calling to generate calls.vcf, finally add these VCF files to reference genome.

Am I right? (Actually I don't think it's right. Really need help.)

glennhickey commented 2 years ago

To make a pangenome from long reads you can either: 1) Create a VCF using a variant caller, then use vg construct or 2) Use an assembler make an assembly for each sample, then align the assemblies together with either cactus or pggb

For 1), you can in theory use vg augment/call as the variant caller, but it's not going to work well for long reads (especially if they are noisy). And since you'll be calling against a linear reference, there's not much point to using vg anyway, so you'd be best served to find another tool more suited to your reads, and use that tool's output VCF as input to vg.

maxineliu commented 2 years ago

To make a pangenome from long reads you can either:

  1. Create a VCF using a variant caller, then use vg construct or
  2. Use an assembler make an assembly for each sample, then align the assemblies together with either cactus or pggb

For 1), you can in theory use vg augment/call as the variant caller, but it's not going to work well for long reads (especially if they are noisy). And since you'll be calling against a linear reference, there's not much point to using vg anyway, so you'd be best served to find another tool more suited to your reads, and use that tool's output VCF as input to vg.

Thank you for your advice!

After generating the VCFs for my several samples, should I merge those VCFs in one VCF or just add them to genome one by one? if merge is needed, what tool should I use?

Thank you again!