Closed iqbal-lab closed 4 years ago
Rerunning the above with new tip (12ab776c2452aacd6c8a2248c834907b62b93f66) - will replace all the above numbers
Benchmarking now from 8b46a86a
Data in: (yoda cluster) Vcf: /nfs/leia/research/iqbal/bletcher/Pf_benchmark/ref_data/pf3k_and_DBPMSPS1and2.vcf Ref: /nfs/leia/research/iqbal/bletcher/Pf_benchmark/ref_data/Pfalciparum.genome.fasta
kmer | num_kmers | CPUs | encode PRG (s) | gen. FM index (s) | masks (s) | kmer index (s) | max RAM |
---|---|---|---|---|---|---|---|
11 | 4194304 | 1 | 2 | 131 | 182 | 5927 (1h 38 mins) | 101.4 Gb |
I will now close this issue and open a fresh one.
The reason is that we use to build the PRG string using zam's perl module and it skipped any overlapping variants in the VCF.
Now we use cluster_vcf_records
which does deal with them.
The consequence is that the constructed PRG string is much bigger.
In 8b46a86 the PRG string in this dataset contains 79,687,317 distinct integers ( DNA + variant markers) on this benchmarking dataset, whereas in a6e9094 which introduces the change there are 457,513,813. This is 5.7 times bigger.
I'll now run benchmarks on this
Go Brice!
Benchmarking build+quasimap of the new plasmodium PRG built from 2.4 million variants + DBLMSP 1+2. Callign this dataset pf_wg_and_dblmsps_v1. PRG here: /nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps/
I'm raising this now, will fill in the table over next day/two, and then we can close or follow up. Updated with new commit, and now using dedicated and non-shared server
Build benchmarks
example command line
Quasimapping benchmark
example command
We are mapping 33,557,648 reads (these have been quality trimmed, length <=76bp) sequenced from GB4 strain.
Reads:
On old commit, d8a3082a921579e65081fa1932c42c4f2fb7953a
On current commit 12ab776c2452aacd6c8a2248c834907b62b93f66