marcelm / cutadapt

Cutadapt removes adapter sequences from sequencing reads
https://cutadapt.readthedocs.io
MIT License
502 stars 126 forks source link

KmerFinder: x of length y is longer than the maximum of 64 #749

Closed marcelm closed 6 months ago

marcelm commented 6 months ago

I encountered this today in a pipeline I wrote a while ago after updating Cutadapt to the most recent version.

$ echo -e '>r\nACGT' | cutadapt --debug -a 'A{70}' -e 0 -
This is cutadapt 4.5.dev50+g0b9c325.d20231106 with Python 3.10.12
Command line parameters: --debug -a A{70} -e 0 -
DEBUG: Python executable: .../cutadapt/.venv/bin/python
DEBUG: dnaio version: 0.10.0
DEBUG: xopen version: 1.7.0
DEBUG: Command line error. Traceback:
Traceback (most recent call last):
  File ".../cutadapt/src/cutadapt/cli.py", line 934, in adapters_from_args
    adapters = make_adapters_from_specifications(args.adapters, search_parameters)
  File ".../cutadapt/src/cutadapt/parser.py", line 386, in make_adapters_from_specifications
    adapters.extend(
  File ".../cutadapt/src/cutadapt/parser.py", line 425, in make_adapters_from_one_specification
    yield make_adapter(spec, adapter_type, search_parameters)
  File ".../cutadapt/src/cutadapt/parser.py", line 466, in make_adapter
    return _make_not_linked_adapter(spec, name, adapter_type, search_parameters)
  File ".../cutadapt/src/cutadapt/parser.py", line 543, in _make_not_linked_adapter
    return adapter_class(
  File ".../cutadapt/src/cutadapt/adapters.py", line 786, in __init__
    super().__init__(*args, **kwargs)
  File ".../cutadapt/src/cutadapt/adapters.py", line 590, in __init__
    self.kmer_finder = self._kmer_finder()
  File ".../cutadapt/src/cutadapt/adapters.py", line 798, in _kmer_finder
    return self._make_kmer_finder(
  File ".../cutadapt/src/cutadapt/adapters.py", line 624, in _make_kmer_finder
    return KmerFinder(
  File "src/cutadapt/_kmer_finder.pyx", line 139, in cutadapt._kmer_finder.KmerFinder.__cinit__
    raise ValueError(f"{kmer} of length {kmer_length} is longer "
ValueError: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA of length 70 is longer than the maximum of 64.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../cutadapt/src/cutadapt/cli.py", line 1129, in main
    adapters, adapters2 = adapters_from_args(args)
  File ".../cutadapt/src/cutadapt/cli.py", line 941, in adapters_from_args
    raise CommandLineError(e.args[0])
cutadapt.cli.CommandLineError: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA of length 70 is longer than the maximum of 64.
ERROR: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA of length 70 is longer than the maximum of 64.

Adapters that are too long for the k-mer finder should skip the k-mer finder optimization.

rhpvorderman commented 6 months ago

Added a test case and fixed the issue in the linked PR.