A big performance bottleneck in generating reference data is parsing a big GTF (with all of our wacky heuristics for handling errors across Ensembl versions).
It's possibly faster to do the core TSV parsing in Polars and then transform the data from there.
A big performance bottleneck in generating reference data is parsing a big GTF (with all of our wacky heuristics for handling errors across Ensembl versions).
It's possibly faster to do the core TSV parsing in Polars and then transform the data from there.