Open i-strielkov opened 2 years ago
if you have reads that are named the same they might be seen as dups, right? not sure the algo takes dup-read inputs well (which should never happen)
Hi, I have been working @i-strielkov on this, using the fastq files that were linked. After poking at the settings for a while, I have identified that using a buffer size that is not the default (I tried -b11 and -b12) reproducibly changes which lines get corrupted, and if set to be larger than the file size (I tried -b500) removes the corrupted output. To me, this indicates that the issue is probably related to that, rather than issues with with the file itself.
In addition, I have also verified that all read names are indeed unique, since this was proposed as a potential cause.
Hi, we have been using your great tool for several years and saved as a lot of disc space! However, recently we have encountered and error that appears during DSRC encoding. Algorithm occasionally skips a number of reads at seemingly random position and then continues. The resulting file contain artifacts like this:
Renaming the reads solves the problem. Do you happen to know what may cause such issues?
The problems were encountered with this public dataset: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-10175/ In particular, the problem can be reproduced with this file: http://ftp.sra.ebi.ac.uk/vol1/run/ERR539/ERR5396174/AML_low_input_AAAACT_r2.fq.gz
Many thanks for any information in advance, Best, Ievgen