Open EricR86 opened 6 years ago
Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).
What do you mean, it "does not index" this data?
Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).
The chunk_starts
and chunk_ends
genomedata/hdf5 attributes are not updated. The attributes get updated when gaps greater than MIN_GAP_LEN
are found. No "gaps" are detected at the beginning or end of a supercontig since Genomedata looks between already existing datapoints.
Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).
After a discussion, the following solution was proposed:
Original report (archived issue) by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).
Currently, Genomedata does not index missing data greater than MIN_GAP_LEN.
However, if the ending of a supercontig is completely full of NaNs, this data will be indexed regardless of length. In the extreme case a supercontig could start with a single datapoint and contain only remaining NaNs and the chunk start and end would contain the entire region even if the region was far greater than MIN_GAP_LEN.
This results in Genomedata reporting large empty regions if the "chunk_starts/ends" attributes are used at the beginning and ending of supercontigs.