wslh-bio / SPNtypeID

SPNtypeID is a Nextflow pipeline used for genome assembly and serotyping of Streptococcus pneumoniae.
GNU General Public License v3.0
2 stars 0 forks source link

Error if non-SPN reads are run #1

Closed garfinjm closed 2 years ago

garfinjm commented 3 years ago

Input: Streptococcus pyogenes reads (a negative control in our re-validation dataset).

Error:

Error executing process > 'results'

Caused by:
  Process `results` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env python3
  import os, sys
  import glob, csv

  class result_values:
    def __init__(self,id):
        self.id = id
        self.comments = []
        self.coverage = "NotRun"
        self.quality = "NotRun"
        self.pass_cov_quality = False
        self.percent_strep = "NotRun"
        self.percent_spn = "NotRun"
        self.secondgenus = "NotRun"
        self.percent_secondgenus = "NotRun"
        self.pass_kraken = False
        self.pred = "NotRun"

  #get list of result files
  cg_result_list = glob.glob("*_qual.tsv")
  kraken_list = glob.glob("*_kraken.txt")
  seroba_list = glob.glob("*_pred.tsv")

  results = {}

  #collect all cg_pipeline results
  for file in cg_result_list:
    id = file.split("_qual")[0]
    result = result_values(id)
    with open(file,'r') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(1024))
        csvfile.seek(0)
        reader = csv.reader(csvfile,dialect)
        for row in reader:
            if "File" not in row:
                result.coverage = float(row[8])
                result.quality = float(row[5])
                if result.coverage > 70.0 and result.quality > 30.0:
                    result.pass_cov_quality = True
                if result.coverage <= 70.0:
                    result.comments.append("Lower than 70x coverage")
                if result.quality <= 30.0:
                    result.comments.append("Lower than 30.0 average base quality")

    results[id] = result

  #collect all kraken results
  for file in kraken_list:
    id = file.split("_kraken")[0]
    result = results[id]
    with open(file,'r') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(1024))
        csvfile.seek(0)
        reader = csv.reader(csvfile,dialect)
        secondgenus = ""
        percent_secondgenus = 0.0
        for row in reader:
            if row[4] == "1301":
                result.percent_strep = float(row[0])
                continue
            if row[4] == "1313":
                result.percent_spn = float(row[0])
                continue
            if row[3] == "G" and float(row[0]) > percent_secondgenus:
                secondgenus = row[5].strip()
                percent_secondgenus = float(row[0])
        result.secondgenus = secondgenus
        result.percent_secondgenus = percent_secondgenus
        if result.percent_strep >= 82.0 and result.percent_spn >= 62.0 and result.percent_secondgenus < 1.0:
            result.pass_kraken = True
        if result.percent_strep < 82.0:
            result.comments.append("Less than 82.0% of reads are Strep")
        if result.percent_spn < 62.0:
            result.comments.append("Less than 62.0% of reads are SPN")
        if result.percent_secondgenus >= 1.0:
            result.comments.append("More than 1.0% of reads are from "+secondgenus)

    results[id] = result

  #collect all seroba results
  for file in seroba_list:
    id = file.split("_pred")[0]
    result = results[id]
    with open(file,'r') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(1024))
        csvfile.seek(0)
        reader = csv.reader(csvfile,dialect)
        types = []
        for row in reader:
            types.append(row[1])
            try:
                result.comments.append(row[2])
            except IndexError:
                pass
        result.pred = " ".join(types)
    results[id] = result

  #create output file
  with open("SPNtypeID_results.csv",'w') as csvout:
    writer = csv.writer(csvout,delimiter=',')
    writer.writerow(["ID","Coverage","Avg Quality","Pass Cov/Qual","Percent Strep","Percent SPN","SecondGenus","Percent SecondGenus","Pass Kraken","Serotype","Comments"])
    for id in results:
        result = results[id]
        comments = "; ".join(result.comments)
        writer.writerow([result.id,result.coverage,result.quality,result.pass_cov_quality,result.percent_strep,result.percent_spn,result.secondgenus,result.percent_secondgenus,result.pass_kraken,result.pred,comments])

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File ".command.sh", line 69, in <module>
      if result.percent_strep >= 82.0 and result.percent_spn >= 62.0 and result.percent_secondgenus < 1.0:
  TypeError: '>=' not supported between instances of 'str' and 'float'

Work dir:
  /panfs/roc/groups/7/mdh/shared/software_modules/spntypeid/beta_201113/test2/work/e4/b6aeb5716f66b6cc7d9c81d9818c97

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Cause: For this sample there are no reads assigned to the SPN taxid (1313) so result.percent_spn remains set to NotRun, this causes a type error when it is compared to 62.0.

https://github.com/k-florek/SPNtypeID/blob/ec7d93219a8b11fefeff84eb73ff92cec8dc3a40/SPNtypeID.nf#L192

You can see a possible solution on my fork, basically just setting all the numeric values to 0.0 instead of NotRun, which so far has worked to fix this bug.

https://github.com/garfinjm/SPNtypeID/blob/b4117cef54897bffa4445dddef7778c7e698aba5/SPNtypeID.nf#L132

AbigailShockey commented 2 years ago

Hi Jake, this error has been fixed and will be included in our upcoming release