idinov / informatics

Automatically exported from code.google.com/p/informatics
0 stars 0 forks source link

ANNOVAR #27

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi, I tested the ANNOVAR pipeline you sent me, but it is failing at the first 
step, independently from the .bam file I am choosing as input, here is the 
error stream if I launch it command line:

 /applications/SAMTOOLS/samtools-0.1.16/samtools pileup -vcf /projects1/Reference_genomes/hg18_ensembl/gatk-canonical/gatk-hg18_ensembl.fa /projects2/USC/canonical-test-data/hg18_ensembl-bwa/full-bam/852603-6191.bam '>'
Floating point exception

Is always the same floating error. :(

Fed

Original issue reported on code.google.com by federica...@gmail.com on 4 Oct 2011 at 7:59

Attachments:

GoogleCodeExporter commented 9 years ago
Hi all-

Just a comment on the floating exceptions we've gotten from Samtools. I think 
in this case, and perhaps in a few of the other instances when the error showed 
up as well, it may be related to the header lines in our reference genome FASTA 
(.fa) file. 

See this thread on the samtools mailing list for a description of the problem:

http://sourceforge.net/mailarchive/forum.php?thread_name=Pine.LNX.4.64.110929082
6400.21023%40defender.gpcc.itd.umich.edu&forum_name=samtools-help

After reading this thread, I made a quick edit to the headers of 
/projects1/Reference_genomes/hg18_ensembl/gatk-canonical/gatk-hg18_ensembl.fa 
to eliminate the redundant text, then tried a few of the problematic samtools 
commands again. I didn't get any floating point exceptions this time. Federica 
and I will continue testing samtools in various places of the pipelines to make 
sure we've truly eradicated the issue, but wanted to share our finding. 

Also, the edited/corrected reference FASTA file I mentioned is here:

/projects1/Reference_genomes/hg18_ensembl/gatk-canonical/gatk-hg18_ensembl.corre
cted.fa

Original comment by apcexcha...@gmail.com on 5 Oct 2011 at 8:18

GoogleCodeExporter commented 9 years ago
Thanks for the info.  I'm still waiting to be able to test this workflow on my 
fgene account. 

Looking at the samtools documentation, it looks like the samtools pileup 
command is deprecated, replaced by mpileup. Also, their documented protocol for 
generating a vcf from a bam is very different:

samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > 
var.raw.bcf  
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

I got that from http://samtools.sourceforge.net/mpileup.shtml

whereas the current workflow just uses:
samtools pileup -vcf ref.fa subject.bam > subject.raw.vcf

Should we switch from pileup to mpileup and change the protocol to the new one 
to avoid using deprecated tools? Assuming i can find and build bcftool and 
vcfutils, i could could together a wrapper script for this new protocol pretty 
quickly.

Any thoughts?
Alex

Original comment by alexge...@gmail.com on 6 Oct 2011 at 6:29

GoogleCodeExporter commented 9 years ago
Yes, I think this may be really helpful...the conversion in vcf with the olp 
pileup command is still running  (1 day and more) and this timing is not 
feasible.

On the annovar website they are not up to date, they are reporting still the 
pileup command..I am writing to the ANNOVAR developer to know if ANNOVAR is 
compatible with the vcf file created by the new features in samtools.

Even if it is not directly compatible there is a feature in ANNOVAR that 
converts vcf files in VCF4 files 
(http://www.openbioinformatics.org/annovar/annovar_input.html)...so, in summary 
i think that using the newer options is a good idea.

Federica

Original comment by federica...@gmail.com on 6 Oct 2011 at 6:46

GoogleCodeExporter commented 9 years ago
Yes, is compatible so I think you can go ahead. I will update the documentation 
accordingly.

federica

Original comment by federica...@gmail.com on 6 Oct 2011 at 6:56

GoogleCodeExporter commented 9 years ago
Sorry it seems like we go through this step with every workflow, but can you 
point me to some relevant data as input for the first samtools mpileup step? it 
takes two alignment files and a reference fasta file, as well as an integer for 
the maximum read depth (I'm using 100).

I've put together the workflow, but it fails, and I want to make sure the 
problem isn't in the data i'm using.

Alex

Original comment by alexge...@gmail.com on 10 Oct 2011 at 9:06

GoogleCodeExporter commented 9 years ago
Hi Alex,

sorry for my late response but I am back now from Montreal. No problem, you are 
right.

It takes only TWO files in total:

-the alignment file: .bam file 

(example:/projects2/USC/canonical-test-data/hg18_ensembl-bwa/mini-bam/s_1.sorted
.bam)

-and the reference genome in fasta format:

( example: 
/projects1/Reference_genomes/hg18_ensembl/gatk-canonical/gatk-hg18_ensembl.fa)

Let me know how it goes and if you have any other questions :)

fed

Original comment by federica...@gmail.com on 17 Oct 2011 at 11:43

GoogleCodeExporter commented 9 years ago
Ok, I've run it with this data and it fails on the convert2annovar.pl step:

NOTICE: the default --format argument is set as 'pileup'
NOTICE: the default --snpqual argument for pileup format is set as 20
NOTICE: Column 6-9 in output are heterozygosity status, SNP quality, total 
reads, reads with mutation
Error: invalid record found in pileupfile 
/projects/pipelineCache/pipeline/2011October18_12h11m34s468ms/SamToolsBamtoVCF_1
.OutputVCF-1.vcf (at least 10 fields expected): <##fileformat=VCFv4.1>

I've attached the workflow if you want to try it. Any ideas? 

Alex

Original comment by alexge...@gmail.com on 18 Oct 2011 at 7:17

Attachments:

GoogleCodeExporter commented 9 years ago
We are looking at that right now. Seems that this mpileup gives some issues...

Original comment by federica...@gmail.com on 21 Oct 2011 at 12:16