tseemann / ekidna

Assembly based core genome SNP alignments for bacteria
GNU General Public License v3.0
25 stars 3 forks source link

Paftools -> Bcftools vcf.gz conversion fails #12

Open harish0201 opened 5 years ago

harish0201 commented 5 years ago

Hi!

Long time fan of your work here!

I have about 16 mitochondrial genome that I needed to find variants with, so I have been trying to use ekidna for it :)

However, I have been stuck in the loop for the conversion of the sample.paf -> sample.vcf.gz either in the ekidna pipeline or checking it manually.

While I'm attaching the log here, I'm also sharing the command that I used (going through the log and ekidna)

paftools call -l 500 -L 500 -s 3 -f ../15.fa 3.paf 2> /dev/null | bcftools view --types snps,mnps

The /dev/null redirection output:

349081 reference bases covered by exactly one contig 5 substitutions; ts/tv = 0.000 0 1bp deletions 1 1bp insertions 0 2bp deletions 0 2bp insertions 0 [3,50) deletions 0 [3,50) insertions 0 >=50 deletions 0 >=50 insertions

hits me up with this error: Failed to open -: unknown file type

Following are the versions of the various tools in case that plays a role:

bcftools: 1.8 seqtk: 1.3-r106 skesa: v.2.2 iqtree: 1.6.7 (multi-core version) bedtools: v2.27.1 v8: 3.16.4 snp-sites: 2.4.1

ekidna, any2fasta were obtained from your repos on March 11th.

I'll have to check the commit of minimap2 though.

tseemann commented 4 years ago

ekidna is not a variant calling pipeline. It is designed for rough + core + SNP finding for building trees. I would not use it for publication level phylogenies - use snippy for that.

The - filename usually refers to stdin ie. use the output of the previous cmd in the pipe.

Is this an older version of ekidna you used?

This is done in 2 steps now:

  run_cmd("paftools call -l $minlen -L $minlen -s $i -f $ref $i.paf > $i.vcf");
  run_cmd("bcftools view --types snps,mnps $i.vcf | bcftools convert -f $ref -Oz -o $i.vcf.gz");