peterjc / thapbi-pict

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool
https://thapbi-pict.readthedocs.io/
MIT License
8 stars 2 forks source link

Use cutadapt --no-indels? #361

Open peterjc opened 3 years ago

peterjc commented 3 years ago

Given recent changes to require full length primer matches in #351, may make sense to also use cutadapt ... --no-indels with the bonus that this should be faster too:

Also, with the --no-indels option, Cutadapt can use a different algorithm and demultiplexing will be many times faster.

https://cutadapt.readthedocs.io/en/stable/guide.html#demultiplexing

peterjc commented 3 years ago

Trying this does drop counts and thus fail our test suite as it stands, the expected counts etc would need updating.

$ git diff master
diff --git a/thapbi_pict/prepare.py b/thapbi_pict/prepare.py
index e7a084dc..e485b706 100644
--- a/thapbi_pict/prepare.py
+++ b/thapbi_pict/prepare.py
@@ -203,7 +203,7 @@ def run_cutadapt(
     """
     if not left_primer or not right_primer:
         sys.exit("ERROR: Can't run cutadapt without two primers")
-    cmd = ["cutadapt", "--fasta", "--discard-untrimmed"]
+    cmd = ["cutadapt", "--fasta", "--discard-untrimmed", "--no-indels"]
     if cpu:
         cmd += ["-j", str(cpu)]
     if min_len: