KFC (K-mer Fast Counter) is a fast and space-efficient k-mer counter based on hyper-k-mers.
It is particularly well-suited for counting large k-mers (with k ≥ 63) from long reads with a low error-rate.
It can also filter k-mers based on their count and only retrieve the k-mers above a certain threshold.
If you have not installed Rust yet, please visit rustup.rs to install it.
Then clone this repository and build KFC using:
git clone https://github.com/lrobidou/KFC
cd KFC
RUSTFLAGS="-C target-cpu=native" cargo +nightly build --release -F nightly
Make sure to set RUSTFLAGS="-C target-cpu=native"
to use the fastest instructions available on your architecture.
If you cannot use Rust nightly, you can also build KFC in stable mode (which may be slightly slower):
RUSTFLAGS="-C target-cpu=native" cargo build --release
This will create a binary located at target/release/kfc
.
The KFC binary provides two main subcommands: build
(to count k-mers from a FASTA/Q file) and dump
(to extract the k-mers contained in a KFC index).
You can view the detailed usage of each subcommand using:
./kfc <subcommand> -h
The first step to any KFC usage is to build a KFC index.
./kfc build -k <k> -i <FASTA/Q> -o <index>.kfc
Once the KFC index is computed, it is possible to dump it to text. The k-mers are not ordered.
./kfc dump -t <threshold> -i <index>.kfc --output-text <kmers>.txt
KFC supports the k-mer file format (see Dufresne et al, The K-mer File Format: a standardized and compact disk representation of sets of k-mers). As such, it is possible to dump a KFC index into a KFF file. The count of each k-mer is encoded in the KFF file.
./kfc dump --input-index <index>.kfc --output-kff <index>.kff
Warning: KFC only handles KFF files built by KFC.
Reading the KFF file produced by KFC should be possible with any implementation supporting KFF, but we recommend relying on KFC for this task. Indeed, a KFF built by KFC respects some assumptions on the count of k-mers, which can be used to dump the KFF file with a lower memory consumption. This also means that files not respecting these assumptions would produce invalid count if dumped by KFC.
./kfc kff-dump --input-kff <index>.kff --output-text <index>.txt