zjshi / gt-pro

MIT License
23 stars 7 forks source link

`GT_Pro parse` => IndexError: string index out of range #56

Open nick-youngblut opened 1 year ago

nick-youngblut commented 1 year ago

Command:

   # genotype
    GT_Pro genotype \
      -d ${gtpro_db}/sckmer_db \
      -o raw_output \
      $read1 $read2

    # parse output
    GT_Pro parse \
      --in raw_output.tsv \
      --dict ${gtpro_db}/sckmer_db.snp_dict.tsv \
      --out parsed_output.tsv

Error:

gt_pro        gtpro_db_optimized/sckmer_db    4       force_overwrite
  1684353832830:  [Info] Starting to load DB: gtpro_db_optimized/sckmer_db
  1684353832830:  [Info] MMAPPING gtpro_db_optimized/sckmer_db_optimized_db_snps.bin
  1684353832830:  [Info] MMAPPING gtpro_db_optimized/sckmer_db_optimized_db_kmer_index.bin
  1684353832838:  [Info] Using -l 32 -m 36 as optimal for system RAM
  1684353832838:  [Info] MMAPPING gtpro_db_optimized/sckmer_db_optimized_db_mmer_bloom_36.bin
  1684353833141:  [Info] MMAPPING gtpro_db_optimized/sckmer_db_optimized_db_lmer_index_32.bin
  1684353834698:  [Info] Done with init for optimized DB with 175132 kmers.  That took 1 seconds.
  1684353834736:  [Info] Waiting for all readers to quiesce
  1684353840417:  [Progress] 1.02 million reads scanned after 5 seconds, and 0 files output.
  1684353845542:  [Done] searching is completed for the 998803 reads input from SRR13068812_R2.fq
  1684353845542:  [Done] searching is completed for the 998803 reads input from SRR13068812_R1.fq
  1684353845544:  [Stats] 2667 snps, 998803 reads, 1.79 hits/snp, for SRR13068812_R2.fq
  1684353845545:  [Stats] 2754 snps, 998803 reads, 1.79 hits/snp, for SRR13068812_R1.fq
  1684353845548:  1.99 million reads were scanned after 10 seconds
  1684353845548:  Successfully processed 2 input files containing 1997606 reads.
  1684353845558:  Totally done: 10 seconds elapsed processing reads, after DB was loaded.
  Traceback (most recent call last):
    File "/opt/gt-pro/scripts/gtp_parse.py", line 124, in <module>
      main()
    File "/opt/gt-pro/scripts/gtp_parse.py", line 120, in main
      gtp_array = read_gtp_out(gtp_fpath)
    File "/opt/gt-pro/scripts/gtp_parse.py", line 44, in read_gtp_out
      snp_type = int(items[0][6])
  IndexError: string index out of range
nick-youngblut commented 1 year ago

Due to the stochasticity of GT_Pro genotype, the GT_Pro parse error occurs during some repeated runs, but not others