This PR creates native BigWig support for Genomedata.
To use a BigWig file through the Genomedata interface, you simply create a genome object with a bigWig file instead of a Genomedata file. e.g. with Genome("myfile.bigWig") as genome.
A few important notes:
Tests (should) cover all documented interfaces. Specifically all the ones through the Genome object
There are a number of public interfaces that are not documented (for example through Chromosome objects). These were covered to the best of my ability but don't necessarily translate well to the BigWig format
Summary statistics are available through the BigWig header much like the Genomedata file, however these are lower precision since they are stored as integer types. Calculating these manually was attempted but they were very slow.
There is an implicit single trackname per Genome object (the filename). This is to keep interfaces consistent between types. For example, returning values that would normally be a list indexed per track will now return a list of a single element.
Each Chromosome is represented with 1 underlying supercontig. Using 'read' on the supercontig is by consequence very slow. Unless the read method of the supercontig is used, the entire dataset is never brought into memory.
Supercontigs/Chromosomes return a numpy array. This might be a slightly different interface than a tables.EArray which was the type used by the Genomedata file type.
Writing and erasing, while technically possible, was not implemented in this PR
Addressing bigWigs by URL, also while (I believe) technically possible, was not implemented in this PR
This PR creates native BigWig support for Genomedata.
To use a BigWig file through the Genomedata interface, you simply create a genome object with a bigWig file instead of a Genomedata file. e.g.
with Genome("myfile.bigWig") as genome
.A few important notes:
read
method of the supercontig is used, the entire dataset is never brought into memory.tables.EArray
which was the type used by the Genomedata file type.