howweirdistheweather / weather_app

2 stars 0 forks source link

REST api / backend is slow #11

Closed jamcinnes2 closed 9 months ago

jamcinnes2 commented 9 months ago

Reading the wxdb dataset is very slow for what should be a seek and a contiguous read of a few kB.

jamcinnes2 commented 9 months ago

I found that the h5py chunking I setup is the problem. The chunking I used was great for speeding up the creation and updating of the wxdb file. But when reading the file it results in a seek and read I/O operation for EACH year of data... 80.

Im adding a final data processing step that will h5repack the file so that it is optimized for reading.

mbjones commented 9 months ago

Awesome that you found that. And makes sense. Then I will wait for that before I redeploy -- I haven't put the new stuff up yet but will when I see this ticket get closed.

jamcinnes2 commented 9 months ago

Ok so after creation of the hwitw.wxdb file and all initial data processing is complete, we can repack the file for better read performance. This only needs to be done once so I wont make it part of the weekly update process. We should probably document this as part of the app installation process.

h5repack --verbose --layout wxdb:CHUNK=10x10x80x52x29 hwitw.wxdb hwitw_repack.wxdb

Then copy hwitw_repack.wxdb over hwitw.wxdb (or symlink). In my testing accessing location data went from taking 20+ seconds to < 1 second.

jamcinnes2 commented 9 months ago

We will see how it performs on the cluster.