refresh-bio / KMC

Fast and frugal disk based k-mer counter
252 stars 73 forks source link

KMC reads the sequences from standard input #199

Closed shokrof closed 1 year ago

shokrof commented 1 year ago

Hi, I am trying to avoid downloading the fastq files on the disk by doing kmer counting on the fly. I want to configure fasterq-dump to output the sequence to the stdout and pipe it to KMC. I tried to do it with pipes(mkfifo) but KMC failed to open the pipe, I am guessing you read the file at multiple locations.

Can you advise if it is possible to accomplish this with kmc?

Thanks, Moustafa

marekkokot commented 1 year ago

Hi,

Indeed KMC opens input files twice. The first open is just to grab some statistics from a small fraction of the input files. I'm afraid that currently, it is not possible (or at least I don't know the workaround, I have no experience with mkfifo, so maybe there is a workaround). We plan to rebuild the way KMC handles input files, I will try to keep in mind this issue to make possible pipe reading, but I'm afraid we have a couple of things with higher priorities to do first :(

Best, Marek

shokrof commented 1 year ago

Thanks, Marek for your reply. I don't think there will be a workaround for it with mkfifo since you read the file multiple times. I guess I just have to keep the files on the desk.