Open Aannaw opened 2 years ago
Hi,
there is also -fm
switch for multi-fasta (fasta where sequence may span multiple lines). Let me know if it helps.
I only have a assembly genome. Actually I want to assess my assembly genome after running purge_dups and another is to compare the kmer counts of the assembly genome and illumina short reads. I run -fm with only one fasta , and it seems useless.
I don't know the purge_dups tools. You may count k-mers in multiple files. Assume you have a bunch of multi-fasta files.
Create a file files.txt
where per each line you store the path to one of the multi-fasta file. For example
file1.fa
file2.fa
You may run kmc as follows:
kmc -k21 -ci1 -t40 -fm @files.txt 21mers .
Does it help?
I create a file a.txt with only a fasta files :
a.fasta
Then I run with kmc -k21 -ci1 -t40 -fm @a.txt tmp
The standard out is:
K-Mer Counter (KMC) ver. 3.1.0 (2018-05-10)
Usage:
kmc [options]
No file is created and no error information is found.
You have an message:
Usage:
kmc [options] <input_file_name> <output_file_name> <working_directory>
kmc [options] <@input_file_names> <output_file_name> <working_directory>
you miss the output_file_name in your command line, use:
kmc -k21 -ci1 -t40 -fm @a.txt output tmp
It works! Thanks very much. Can I ask another question? About illumina paired short reads (a.1.fq,a.2.fq), should I run kemr count with creating a file a.fq.txt: a.1.fq a.2.fq and then run with "kmc -k21 -ci1 -t40 -fq @a.fq.txt out tmp"? Does it output the kmers common to the two paird short reads file?
It will count each k-mer present in at least one of the input files. Probably for sequencing reads one should set some rationale cutoff (-ci) to remove erroneous k-mers.
It is much helpful! Thanks very much
No problem. I'm closing this issue. You may reopen if needed.
Hi @marekkokot, I have the very same issue. No matter what combination of parameters I use, I always get a segfault. For example:
./kmc -v -fm -k31 -ci0 -m2 -t1 -sm ecoli1.fasta ecoli1.kmc kmc_tmp_dir
Why?
Hi,
I don't think it is the very same issue. It looks much worse. Do you use kmc downloaded from the release page, or maybe from bioconda or maybe you have compiled it on your own? Let me know. Also, could you please send me your input file, i.e. ecoli1.fasta ?
Hi, I cloned the repo from here (Github) and then compiled it on my machine. Compilation works file. Here is the file attached (it is a tiny file).
These are my commands:
./kmc -v -fm -k31 -ci0 -t1 ecoli1.fasta ecoli1.kmc kmc_tmp_dir
./kmc -v -fm -k31 -ci0 -t1 @list.txt ecoli1.kmc kmc_tmp_dir/
where currently list.txt contains the filepath of just that ecoli1.fasta file.
It works on my machine.
What is your operating system and compiler?
And maybe what is your hardware?
Just to be sure, do you have kmc_tmp_dir
created?
My running gcc on Ubuntu: gcc version 11.2.0 (Ubuntu 11.2.0-7ubuntu2) . I've also tried the release commit (b7de846829f7d8cfd18a3d1285deba6ee8ceffc2) but nothing changes. Of course, I have the tmp directory created.
Ok, this is wired :( Could you please try the precompiled release? I may also try to remove -static flag from makefile and also -Wl,--whole-archive and -Wl,--no-whole-archive flags.
I tried another machine of mine (Ubuntu again with gcc) and actually it worked. Very strange indeed. Everything else works correctly on the previous machine.
It may be hard for me to track the cause when I am not able to reproduce the error. If you have some time maybe try to run kmc under gdb (some changes in makefile may be needed) to see where it crashes. Maybe, for some strange reason, kmc cannot allocate memory? How much memory does your machine have?
My machines have 128GB of RAM :) Also, why not including some examples in the readme? I see a lot of people got confused or have no idea about how to run this tool. For example: I got these two files now
ecoli1.kmc.kmc_pre
ecoli1.kmc.kmc_suf
which one should I use?
Ok, so this is not out of memory :) Strange :( Thanks for the suggestion. We indeed need to improve the readme. Some examples are given in the command line help. I didn't realize a lot of people got confused. This is bad. I thought the opposite is true.
Regarding kmc_pre
and kmc_suf
files. You should use both because kmc output is split into two files. Alternatively, you could set the output format to KFF, which would be a single file, but probably larger one.
Ok thanks!
hello I am confusion with the command for the kmc count kmer with assembly genome (fasta). I actually do not find an example. My command is kmc -k21 -ci0 -t40 -m20 -fa a.fasta ./tmp. No error is present but the program is clapsed. Looking forward with reply. Thanks very much.