refresh-bio / KMC

Fast and frugal disk based k-mer counter
252 stars 73 forks source link

request: specify several input files (paired-end reads) #208

Closed rderelle closed 1 year ago

rderelle commented 1 year ago

Hi,

I'm using KMC for my project and it works beautifully. Thanks.

My only concern is that we cannot specify several input files for KMC to count Kmers, which would be extremely useful for the analysis of paired-end reads. Instead, from what I understood, we can either create a merged file of reads Forw and Rev that would be used as input, or count Kmers independently for reads Forw and Rev and then merge the KMC databases.

Could it be possible to add the option to have a coma separated list of files that would be combined in the same database?

many thanks Romain

marekkokot commented 1 year ago

Hi!

thanks for using KMC! Actually, there is an option to use multiple input files. Lets say you have files:

A_1.fastq,A_2.fastq,B_1.fastq,B_2.fastq

You may create a text file with paths to these files, lets say input.txt with the following content:

A_1.fastq
A_2.fastq
B_1.fastq
B_2.fastq

And then run KMC as follows (some parameters are just exemplary):

kmc -k27 -ci2 -t16 @input.txt 27mers .

KMC will count all the canonical k-mers from these reads. Is that what you want?

You mention paired-end reads and comma separated. In general, KMC is not aware of paired-end it just treats all the input files the same and counts k-mers (canonical by default, but if you need non-canonical mode there is a flag -b). Let me know if it helps and if not please define what you would like to have.

Best Marek

rderelle commented 1 year ago

Thanks a lot for your reply Marek! The idea was to analyse several files at once, and sorry, I have totally missed the option @file.txt. That will work.

thanks again Romain

marekkokot commented 1 year ago

Great :) So I am closing this.