The human genome is 3billion bases and the MHC is 5Mb long.
This PRG just contains the 8 reference MHC haplotypes, but no other variation - so 99.8% of the genome has no variation.
Final number in PRG alphabet is 23690.
gramtools version
{
"version_number": "0.5.0",
"last_git_commit_hash": "d8a3082a921579e65081fa1932c42c4f2fb7953a",
"truncated_git_commits": [
"d8a3082 - Robyn Ffrancon, 1527688551 : enhancement: build command optionally skips building PRG",
"2dac562 - Robyn Ffrancon, 1527601335 : enhancement: quasimap commands ensures that build command executed successfully",
"760b759 - Robyn Ffrancon, 1527599820 : enhancement: build stops and returns non-zero if no variants sites found in prg",
"f3b8cff - Robyn Ffrancon, 1527597315 : enhancment: removed unused skip optimisation code",
"e22cd4f - Robyn Ffrancon, 1527590325 : fix: SA indexes associated with correct site-allele paths for allele encapsulated mappings"
]
}
Build benchmarks
I'll start on the cluster, which involves using shared machines, but my benchmarking machine is totally blocked benchmarking p. falciparum
kmer
CPUs
encode PRG (sec)
generate FM index (sec)
masks (sec)
Total human experienced time
kmer index (sec)
max RAM
5
1
1.4
105.5
74
20
3 mins
350Mb
7
1
4
144
85
350
10mins
374Mb
9
1
1
109
71
45
4 mins
400Mb
Quasimap benchmarks
The vast majority of reads (99.8%) are irrelevant, and will be discarded immediately because they don't hit the kmer index.
Mapping a huge fastq of NA12878 reads ...~ 747.5 million reads.
The human genome is 3billion bases and the MHC is 5Mb long. This PRG just contains the 8 reference MHC haplotypes, but no other variation - so 99.8% of the genome has no variation. Final number in PRG alphabet is 23690.
gramtools version
Build benchmarks
I'll start on the cluster, which involves using shared machines, but my benchmarking machine is totally blocked benchmarking p. falciparum
Quasimap benchmarks
The vast majority of reads (99.8%) are irrelevant, and will be discarded immediately because they don't hit the kmer index. Mapping a huge fastq of NA12878 reads ...~ 747.5 million reads.