While performance on large genotype only VCFs is excellent (better than bcftools in terms of throughput in my tests), it is quite poor on complicated VCFs. For chromosome 2 on the recent 1000 genomes data I'm getting less than 1MB per second, which is 50X less than we need (bcftools view is doing around 60 MB/s).
My sense is that it's probably not worth chasing perf here using numba. Jax also doesn't seem like a good fit. I'm actually inclined to write C extension that follows the logic of the current buffer-based numba approach, as I think it would be less work in the long run, get rid of the nasty latency issues involved in JIT compiling. For something like this, I think a well written C extension is less maintenance work than fancy python based stuff. Once you write a C extension, it really doesn't need much maintenance.
We should do some profiling first to see where the bottlenecks are, though.
While performance on large genotype only VCFs is excellent (better than bcftools in terms of throughput in my tests), it is quite poor on complicated VCFs. For chromosome 2 on the recent 1000 genomes data I'm getting less than 1MB per second, which is 50X less than we need (bcftools view is doing around 60 MB/s).
My sense is that it's probably not worth chasing perf here using numba. Jax also doesn't seem like a good fit. I'm actually inclined to write C extension that follows the logic of the current buffer-based numba approach, as I think it would be less work in the long run, get rid of the nasty latency issues involved in JIT compiling. For something like this, I think a well written C extension is less maintenance work than fancy python based stuff. Once you write a C extension, it really doesn't need much maintenance.
We should do some profiling first to see where the bottlenecks are, though.