weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
149 stars 21 forks source link

Data read efficiency #10

Closed alisandra closed 1 year ago

alisandra commented 3 years ago

Where we read from many different h5 datasets (not just X, y, but also transitions, coverage, spliced coverage, coverage score, someday phase...) we can bottle neck (particularly for smaller networks) on the data read in.

Check if the H5 files can be restructured to make for less random reads / keep data from different datasets but same index in a way it's easy to get all relevant data by index. Then pro/con and decide if it's worth the effort and repercussions to change...