Open dd2019d opened 5 years ago
The vcf.gz file represents variation against a reference, usually in FASTA format (often written as .fna).
What is in your .fna file?
On Sun, Aug 11, 2019, 21:20 dd2019d notifications@github.com wrote:
hi,
this is not an issue. plz bear with me since i am new to this. in the example to construct, you use the ".vcf.gz" file. currently, i have the ".fna" file, but how or where do i get the corresponding ".vcf.gz" file?
thanks.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2384?email_source=notifications&email_token=AABDQENBAZZL6IT4OPVW7SDQEBYCRA5CNFSM4IK4UJJ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HETFRVA, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEP73L43VTV2K5HQ5EDQEBYCRANCNFSM4IK4UJJQ .
First line says:
CM008287.1 Fusarium oxysporum f. sp. ridicis-cucumerinum strain Forc016 chromosome 1, whole genome shotgun sequence
This is a reference file named "GCA_001702695.2_ASM170269v2_genomic.fna". I thought I could use this file instead of ".fa" file shown in the example.
Also, do I need the the ".vcf" file? Can I just call
vg construct -r my.fna > graph.vg
Yes, you should be able to. But without variation information or multiple references there is not a benefit to using vg.
On Mon, Aug 12, 2019, 03:28 dd2019d notifications@github.com wrote:
First line says:
CM008287.1 Fusarium oxysporum f. sp. ridicis-cucumerinum strain Forc016 chromosome 1, whole genome shotgun sequence
This is a reference file named "GCA_001702695.2_ASM170269v2_genomic.fna". I thought I could use this file instead of ".fa" file shown in the example.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2384?email_source=notifications&email_token=AABDQELUPYNZNBAU4SOWD2DQEDDFHA5CNFSM4IK4UJJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4BORQY#issuecomment-520284355, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEM3UDHE46C52ZY2JWDQEDDFHANCNFSM4IK4UJJQ .
How or where do I obtain the .vcf file for my data? If not available, then is it OK to use two .fna files as input, e.g.,
vg construct -r one.fna -r two.fna > graph.vg
@ekg has a good overview of the various input types here: https://github.com/ekg/alignment-and-variant-calling-tutorial
I have the same question as dd2019d. I only have the reference genome of my research organism in .fa
file and long-reads sequences of several samples in .bam
file. I wonder how to obtain the VCF file? How to construct my pangenome?
What I think now is that first I need tools like samtool to convert .bam
to .fasta
, then construct a flat reference genome with vg construct
, then do alignment - augumentation - varients calling to generate calls.vcf, finally add these VCF files to reference genome.
Am I right? (Actually I don't think it's right. Really need help.)
To make a pangenome from long reads you can either:
1) Create a VCF using a variant caller, then use vg construct
or
2) Use an assembler make an assembly for each sample, then align the assemblies together with either cactus
or pggb
For 1), you can in theory use vg augment/call
as the variant caller, but it's not going to work well for long reads (especially if they are noisy). And since you'll be calling against a linear reference, there's not much point to using vg
anyway, so you'd be best served to find another tool more suited to your reads, and use that tool's output VCF as input to vg.
To make a pangenome from long reads you can either:
- Create a VCF using a variant caller, then use
vg construct
or- Use an assembler make an assembly for each sample, then align the assemblies together with either
cactus
orpggb
For 1), you can in theory use
vg augment/call
as the variant caller, but it's not going to work well for long reads (especially if they are noisy). And since you'll be calling against a linear reference, there's not much point to usingvg
anyway, so you'd be best served to find another tool more suited to your reads, and use that tool's output VCF as input to vg.
Thank you for your advice!
After generating the VCFs for my several samples, should I merge those VCFs in one VCF or just add them to genome one by one? if merge is needed, what tool should I use?
Thank you again!
hi,
this is not an issue. plz bear with me since i am new to this. in the example to construct, you use the ".vcf.gz" file. currently, i have the ".fna" file, but how or where do i get the corresponding ".vcf.gz" file?
thanks.