vsbuffalo / qrqc

Quick Read Quality Control
bioconductor.org/packages/release/bioc/html/qrqc.html
20 stars 7 forks source link

Number of unique sequences not correct in report #2

Closed najoshi closed 12 years ago

najoshi commented 12 years ago

Hey Vince,

So I'm running qrqc on a small fastq file and I'm noticing that the number of unique sequences that are reported is less than the actual number of unique sequences. I hashed the sequences in perl and counted the number of unique ones and it was many more than the number in the qrqc report. Just thought you should know.

vsbuffalo commented 12 years ago

Was hash.prop in readSeqFile() a value less than 1? If so, that is expected: to conserve memory for very large sequence files, only a random proportion of reads are hashed.

najoshi commented 12 years ago

Yes, I believe it was. However, the file I was using was pretty small.... about 20Mb. It seems that for files of that size it really shouldn't be a problem.

On Thu, Jul 12, 2012 at 7:35 PM, Vince Buffalo reply@reply.github.com wrote:

Was hash.prop in readSeqFile() a value less than 1? If so, that is expected: to conserve memory for very large sequence files, only a random proportion of reads are hashed.


Reply to this email directly or view it on GitHub: https://github.com/vsbuffalo/qrqc/issues/2#issuecomment-6953217

Nikhil Joshi Bioinformatics Analyst/Programmer UC Davis Bioinformatics Core http://bioinformatics.ucdavis.edu/ najoshi -at- ucdavis -dot- edu 530.752.2698 (w)