figure out a way for 3 people to work simultaneously
[ ] get close to std::getline parsing
figure out why we aren't close
std::getline is 0.46ns per byte
[X] make canonicalization + encoding work
forward + reverse strand + encode.
[ ] bring f_start and f_end assginment to outside the parser
f_end for every shrad: x 4096 chunks + one extra sequence in the x+1th chunk
[ ] build test suite for parser
basic num_inserts comparison against gerbil's number of inserts.
datasets: fvesca, dmelagonaster, hsapiens, the "split" datasets, the testfile for handling split at exactly "@"
[ ] build test suite for hashtable
have a python script which compares kmers inserted into std::map vs those inserted into simple_kht, integrate this or work on cpp version
[ ] support longer kmers (now only upto 32-mers)
decide on max k to support (gerbil does 200)
std::pair of two unsigned 64-bit integers can work for 64-mers/ array for higher k-mers
structure packing in hashtable
[ ] write parser again from ground up?
parser currenlty uses three buffers
first buffer: from fread
second buffer: for parsed sequence, to slide window over.
third buffer: to output kmers
use only one/two/three buffers?
[X] support uneven splits with numa
[ ] always use node.cpu_list and n->get_num_nodes()
[ ] support explicit splits in spawn_shard_threads (x threads on node 0, y threads on node 1, and so on) ?
[X] support numa in bqueues spawning (currently only node 0 is allowed)
[X] make bqueues spawning respect numa-split ? how?
[ ] support explicit splits in bqueues spawning (x/y producers/consumers on node 0, a/b producers/consumers on node 1, and so on) ?
[X] use NumaPolicy to assign cpus.
[ ] unblock last core
it currently just spins, waiting for other cores to finish doing work.
[ ] patch all tests to respect ht-fill
[X] ht_tests done
[x] bq_tests done
[ ] bqueues: calc xxhash on kmer at producer
calculate xxhash once at producer and use it to compute which cpu it goes to and pass on the same hash in the queue data so that we don't need to hash again at ht.
[ ] bqueues: add to_cpu hashing with trivial modulo (contradictory to previous)
[ ] this applies to the above two tasks:
respect ht-fill command-line arg.
hashtables used in bqueues should be such that they are filled according to ht-fill command line arg
[X] fix bqueue_tests fails if xorwow is enabled
[ ] estimate the ht size based on the input file size and the number of threads and ht-fill
how EXACTLY to estimate ht size?
what about ht size for synth tests?
[ ] other tasks
several TODOs in code
refactor var names
own namespace
fix error handling everywhere
drop caches outside executable
dos2unix before parsing, outside executable
[ ] remove usage of lseek/lseek64 to determine if reached end of assignment in parser/reader.
[ ] figure out why parsing large files takes more cycles/kmer (fv vs. dm) in --mode=5 with -D__MMAP_FILE
[ ] use debug function dbg from dbg.hpp along with a macro -DDEBUG to print debug messages across the application
[ ] verify bqueue fixes logic
[ ] port all optimizations from SimpleKmerHashTable to CASKmerHashTable
[ ] refactor to use clangformat llvm
[ ] debug 1p1c bqueues performance, and compare it to without bqueues
node.cpu_list
andn->get_num_nodes()
support explicit splits inspawn_shard_threads
(x threads on node 0, y threads on node 1, and so on) ?numa-split
? how?support explicit splits in bqueues spawning (x/y producers/consumers on node 0, a/b producers/consumers on node 1, and so on) ?NumaPolicy
to assign cpus.lseek/lseek64
to determine if reached end of assignment in parser/reader.fv
vs.dm
) in--mode=5
with-D__MMAP_FILE
dbg
from dbg.hpp along with a macro-DDEBUG
to print debug messages across the applicationSimpleKmerHashTable
toCASKmerHashTable
llvm