pcingola / SnpEff

Other
244 stars 78 forks source link

Need help in building database for Influenza virus for SnpEff #111

Closed Ramanandan closed 8 years ago

Ramanandan commented 8 years ago

Dear Pablo,

I am currently working on a bioinformatics project and using SnpEff tool for annotating my variant calls. Thanks for developing one of the most useful tool for scientific community.

I have variants call files from Influenza virus. I have been provided just 8 gene accession number for Influenza virus and I don't have the Influenza virus genome file. I performed following steps, I have taken all the 8 genes fasta sequence in a single file and built an index for alignment. I used GATK tool for producing my VCF file.

Now I would like to annotate my VCF records, so I am using the SnpEFF tool. I checked this link for building manual databases (http://snpeff.sourceforge.net/SnpEff_manual.html#databases). I have just 8 gene genbank files (not the genome genbank file). How do I build database for influenza virus without genome file.

pcingola commented 8 years ago

Hi Ramanandan, You should be using the same genome reference throughout the whole analysis (from mapping reads, variant calling and variant annotations) but it would be better to just use the influenza reference genome instead "trying to stitch together 8 genes". So would not recommend following the path you are choosing. I hope this helps.

Ramanandan commented 8 years ago

Dear Pablo, I downloaded the genome of influenza A virus from this link (http://www.ncbi.nlm.nih.gov/genome/10290?genome_assembly_id=253920). Even the download genome genbank file has 8 genes's genbank file in below format. I have attached the genbank to your email for reference. This is how the contents are stored in the genome genbank file. LOCUS NC_026438 2280 bp cRNA linear VRL 23-FEB-2015 …. … LOCUS NC_026435 2274 bp cRNA linear VRL 23-FEB-2015 …. … LOCUS NC_026437 2151 bp cRNA linear VRL 23-FEB-2015 …. … LOCUS NC_026433 1701 bp cRNA linear VRL 23-FEB-2015 …. … LOCUS NC_026436 1497 bp cRNA linear VRL 23-FEB-2015 … .. LOCUS NC_026434 1410 bp cRNA linear VRL 23-FEB-2015 …. … LOCUS NC_026431 982 bp cRNA linear VRL 23-FEB-2015 … … LOCUS NC_026432 863 bp cRNA linear VRL 23-FEB-2015 … .. The influenza genbank file has got only the CDS information.