Open Will-Tyler opened 2 months ago
We already have a C extension module, so it wouldn't be that hard to update it to include computing AC and AN.
See https://github.com/sgkit-dev/vcztools/pull/77#issuecomment-2334553173 for details on slowdown
Putting this in the initial release milestone for now, can triage out later if it's not critical.
Description
When the user specifies a sample selection in vcztools view, vcztools recalculates the AC and AN INFO fields. This is consistent with bcftools' behavior. vcztools calculates these INFO fields using all of the samples in a variant-wise chunk of genotype data. The current implementation in pure Python using NumPy may be slow and create a lot of overhead. This issue is to improve the computation and memory efficiency. The solution may require calculating AC and AN in a C extension module.
The original code was added in #77.
References