sid-krish / rhometa

metagenome population recombination rate estimation pipeline
MIT License
12 stars 0 forks source link

Error running `theat_est.nf` workflow with multi-sequence reference. #50

Closed cerebis closed 2 years ago

cerebis commented 2 years ago

The following error is returned when using a multi-sequence reference.

Command error:
  [mpileup] 1 samples in 1 input files
  [E::hts_open_format] Failed to open file "3561" : No such file or directory
  Traceback (most recent call last):
    File "/shared/homes/120274/progenomes/rhometa/bin/theta_est.py", line 92, in <module>
      num_variant_positions = len(get_var_pos_from_vcf(vcf))
    File "/shared/homes/120274/progenomes/rhometa/bin/theta_est.py", line 15, in get_var_pos_from_vcf
      f = pysam.VariantFile(vcf_file)
    File "pysam/libcbcf.pyx", line 4054, in pysam.libcbcf.VariantFile.__init__
    File "pysam/libcbcf.pyx", line 4279, in pysam.libcbcf.VariantFile.open
  FileNotFoundError: [Errno 2] could not open variant file `b'3561'`: No such file or directory

This occurs because of the assumption made in rhometa's BASH script when calculating genome size.

Eg. genome_size=$(samtools view -H Aligned_sorted.bam | grep "@SQ" | awk '{ print $3 }' | cut -c 4-)

This will produce multiple lines which immediately after is interpreted as multiple arguments to freebayes.

Instead, try:

genome_size=$(samtools view -H Aligned_sorted.bam |  awk '/^@SQ/ {l+=substr($3,4)}END{print l}')
sid-krish commented 2 years ago

fixed 26ddd69f635a0b2b52d124e4f7f2da27ba03dbcb