Using --wildcard-file results in "IndexError: string index out of range" error

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

Running cutadapt with --wildcard-file option turned on:

cutadapt -f fastq \
  -g AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT \
  -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG \
  -O 5 -m 15 --wildcard-file=wildcard.txt test.fastq

If applicable, please provide a minimal example, such as the sequence of a
single read that was not trimmed correctly.

test.fastq file:

@test
CGGGGCCGAGGGAGCGAGACCCGTCGCCGCGCTCTCCCCAGATCGGAAGAGCACACGTCTGAACTACAGTCCCGGC
+
@@@B?>A@@1CD>G?GE6FFGCHIEHGDB/9>3'5@:>@C<A:3<:<<5((++8?808AC8:A@############

What is the expected output? What do you see instead?

Traceback (most recent call last):
  File "cutadapt-1.1/bin/cutadapt", line 10, in <module>
    cutadapt.main()
  File "cutadapt-1.1/cutadapt/scripts/cutadapt.py", line 817, in main
    read, trimmed = cutter.cut(read)
  File "cutadapt-1.1/cutadapt/scripts/cutadapt.py", line 533, in cut
    read = match.adapter.remove(read, match)
  File "cutadapt-1.1/cutadapt/scripts/cutadapt.py", line 311, in remove_back
    self.write_wildcard_file(read, match)
  File "cutadapt-1.1/cutadapt/scripts/cutadapt.py", line 322, in write_wildcard_file
    if self.sequence[match.astart + i] == 'N' ]
IndexError: string index out of range

What version of the product are you using? On what operating system?
cutadapt 1.1 on linux with python 2.7

Original issue reported on code.google.com by mpasz...@gmail.com on 11 Oct 2012 at 2:50

GoogleCodeExporter commented 9 years ago

Hi, thanks for your report. I think I have fixed this problem already in the 
most recent, but not yet released, cutadapt version. I have just added a 
pre-version of cutadapt 1.2 to the download page. Could you please download it 
and check whether you still have the error?

Original comment by marcel.m...@tu-dortmund.de on 11 Oct 2012 at 3:17

GoogleCodeExporter commented 9 years ago

Hi,

Thanks for the fast reply. However, the error still persists. Now it
happens with the following pair of sequences:

The read:
CTTTCCGCTGTACCCTGCCACATATTCTTCGTTGGAAGAGCACACGTCTGAACTCCAGTCACGGCC

The adapter:
GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

Traceback (most recent call last):
  File "/home/maciejp/Downloads/cutadapt-1.2rc2/bin/cutadapt", line
10, in <module>
    cutadapt.main()
  File "/home/maciejp/Downloads/cutadapt-1.2rc2/cutadapt/scripts/cutadapt.py",
line 850, in main
    read, trimmed = cutter.cut(read)
  File "/home/maciejp/Downloads/cutadapt-1.2rc2/cutadapt/scripts/cutadapt.py",
line 592, in cut
    print(matched_wildcards(match), read.name, file=self.wildcard_file)
  File "/home/maciejp/Downloads/cutadapt-1.2rc2/cutadapt/scripts/cutadapt.py",
line 364, in matched_wildcards
    if match.adapter.sequence[match.astart + i] == wildcard_char ]
IndexError: string index out of range

I hope it helps.

Original comment by mpasz...@gmail.com on 11 Oct 2012 at 4:01

GoogleCodeExporter commented 9 years ago

I hope I've fixed the issue. You can get the most recent version from Github: 
https://github.com/marcelm/cutadapt
If you don't know how to use Git, just download the zip file. Please let me 
know if it works.

Original comment by marcel.m...@tu-dortmund.de on 12 Oct 2012 at 11:15

GoogleCodeExporter commented 9 years ago

Yes, it seems to work fine. However, if I use an input file (uncompressed or 
gziped fastq) with a single read, the script finishes with the following 
comment:

No reads were read! Either your input file is empty or you used the wrong 
-f/--format parameter.

The output file with a single read is generated nevertheless.

Original comment by mpasz...@gmail.com on 12 Oct 2012 at 3:04

GoogleCodeExporter commented 9 years ago

Sorry about that. It is the current development version that has a bug in the 
statistics code. It simply did not count some reads. If you get the most recent 
version, you will not get that particular message, but some of the numbers will 
still be off. The trimming is unaffected and will work as before, though. I 
hope I can repair that in the next days.

Original comment by marcel.m...@tu-dortmund.de on 18 Oct 2012 at 3:41

GoogleCodeExporter commented 9 years ago

Thanks, it's good to know that this bug did not affect the trimming.

Cheers!

Original comment by mpasz...@gmail.com on 18 Oct 2012 at 3:46

GoogleCodeExporter commented 9 years ago

The current version of cutadapt (1.2.1) shows correct statistics.

Original comment by marcel.m...@tu-dortmund.de on 10 Dec 2012 at 10:40

Changed state: Fixed

jgaetel / cutadapt

Using --wildcard-file results in "IndexError: string index out of range" error #52