lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.38k stars 308 forks source link

seqtk sample gives empty output #145

Closed Gil-marquez closed 4 years ago

Gil-marquez commented 5 years ago

I've been using seqtk sample to randomly subsample fastq files with about 75 M reads. I tried to subsample 5M, 10M, 15M,..., 30M, 35M,... 70M reads but at the time the 35M reads subsampling comes, the outputs began to be empty. Does anyone know what could be the problem?

seqtk sample -s12 H3K27-1.fastq 10000000 > subsampling/H3K27-1.10M.fastq seqtk sample -s14 H3K27-1.fastq 15000000 > subsampling/H3K27-1.15M.fastq seqtk sample -s16 H3K27-1.fastq 20000000 > subsampling/H3K27-1.20M.fastq seqtk sample -s18 H3K27-1.fastq 25000000 > subsampling/H3K27-1.25M.fastq seqtk sample -s20 H3K27-1.fastq 30000000 > subsampling/H3K27-1.30M.fastq seqtk sample -s22 H3K27-1.fastq 35000000 > subsampling/H3K27-1.35M.fastq seqtk sample -s24 H3K27-1.fastq 40000000 > subsampling/H3K27-1.40M.fastq seqtk sample -s26 H3K27-1.fastq 45000000 > subsampling/H3K27-1.45M.fastq seqtk sample -s28 H3K27-1.fastq 50000000 > subsampling/H3K27-1.50M.fastq seqtk sample -s30 H3K27-1.fastq 55000000 > subsampling/H3K27-1.55M.fastq seqtk sample -s32 H3K27-1.fastq 60000000 > subsampling/H3K27-1.60M.fastq seqtk sample -s32 H3K27-1.fastq 65000000 > subsampling/H3K27-1.65M.fastq seqtk sample -s32 H3K27-1.fastq 70000000 > subsampling/H3K27-1.70M.fastq

I also tried changing the seed value, but nothing changes in the output.

tseemann commented 4 years ago

@Gil-marquez i had a look at the code, and it seems it needs to allocate RAM for N sequence structs when you want to sample N records. I suspece you might be running out of RAM.

Use the -2 option to do 2-pass mode instead.

If this works, please close this issue. Thanks!

Usage:   seqtk sample [-2] [-s seed=11] <in.fa> <frac>|<number>

Options: -s INT       RNG seed [11]
         -2           2-pass mode: twice as slow but with much reduced memory
Acribbs commented 4 years ago

FYI I had a similar issue and @tseemann solution fixed my issue

tseemann commented 4 years ago

@Gil-marquez please close this issue now.