problem with length threshold

najoshi / sickle

Windowed Adaptive Trimming for fastq files using quality

MIT License

219 stars 95 forks source link

problem with length threshold #20

Closed boryanakis closed 10 years ago

boryanakis commented 11 years ago

Hi,

I am using sickle to trim a SE fastq file. The command I used is:

$ sickle se -f CAH_CaffRNAseq_HC.fastq -t sanger -o CAH_CaffRNAseq_HC_Sickle.fastq -q 30 -l 30 -n

The problem is that the output file contains reads with length <30. I have played around with different length and quality thresholds, and the problem persists. I have also checked that using both the zipped and unzipped versions of the file results in the same thing. I have also tried recompiling and re-running.

Is this a known issue or am I doing something wrong? I am using version 1.210.

Thank you.

jmw86069 commented 11 years ago

For what it's worth, I have the same problem, specific to version 1.210. I don't have the problem with version 1.200. My brief scan of the C code, I couldn't find an obvious bug, but I couldn't quite understand the logic going on in sliding.c. Obviously this is a show-stopper for version 1.210 -- I'm currently rolling back to version 1.200. Can't use reads with only 3-4 nucleotides in them. Let me know if you want me to post a few fastq entries that demonstrate the problem, in case they're weird cases.

najoshi commented 10 years ago

Can both of you send me fastq files (or just a partial file) where you find this happening? It would help me debug it. As far as I remember, I didn't really change much between 1.20 and 1.21, so I will need to test it out.

Nik.

Buttonwood commented 10 years ago

How to trim duplicaiton reads？ I mean after PCR, 2 PE reads will be the same.

najoshi commented 10 years ago

I finally got around to fixing this problem. Looks like it was caused by excessive 5' low qualities. Anyways, any more testing would be highly appreciated.