pacificclimate / orca

OPeNDAP Request Compiler Application
GNU General Public License v3.0
0 stars 0 forks source link

Improve performance #13

Open nikola-rados opened 3 years ago

nikola-rados commented 3 years ago

While the script is working as intended thus far the performance may become a concern to its viability. With this issue we will seek out ways to improve the speed.

nikola-rados commented 3 years ago

Examining the snakeviz output for a request of size 571mb (this is the reported size from Dataset.nbytes / 2) we get a pretty clear picture of what is holding back the performance: image Note: Given the exact same parameters I've seen this time vary quite a bit, anywhere from late 200 seconds to early 400 seconds.

The Dataset.to_netcdf() method takes up basically the entire runtime of the program. If we follow the call stack to the bottom, we see that the method is already using some threading to handle its execution: image

Despite this is doesn't seem to do things particularly quickly (at least it feels that way). @cairosanders and I have already tried to incorporate asyncio to simultaneously load the the individual requests but support from xarray of asynchronous tasks is pretty limited. Also the main bottleneck of to_netcdf still exists unfortunately.

I don't know what the performance requirements/expectations are for orca but I get the feeling this may be a little too slow. As such I was hoping to open up some discussion about how to go about potentially speeding this up.

nikola-rados commented 3 years ago

To add some more details the results above were achieved by running: make performance which runs a test case that splits a single request into two. Here is a look at the parameters passed into the script (found in the link above):

scripts/process.py -u tasmax_day_BCCAQv2_bcc-csm1-1-m_historical-rcp26_r1i1p1_19500101-21001231_Canada -v tasmax[0:1:15000] -t [0:1:91] -n [0:1:206] -l DEBUG

The original request is split into these two requests:

'https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[0:1:7500][0:1:91][0:1:206]'
'https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[7501:1:15000][0:1:91][0:1:206]'

These are split in half on the time variable such that both requests are under the threshold.

Here is the full set of logs from the run:

2021-02-26 13:08:15 INFO: Processing data file request
2021-02-26 13:08:15 DEBUG: Starting db session
2021-02-26 13:08:15 DEBUG: Got filepath: /storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc
2021-02-26 13:08:15 DEBUG: Initial url: https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[0:1:15000][0:1:91][0:1:206]
2021-02-26 13:08:15 INFO: Downloading data file(s)
2021-02-26 13:08:16 DEBUG: Splitting, request over threshold: 571358088.0
2021-02-26 13:08:16 DEBUG: URL(s) for downloading: ['https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[0:1:7500][0:1:91][0:1:206]', 'https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets/storage/data/climate/downscale/BCCAQ2/bccaqv2_with_metadata/tasmax_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp26_r1i1p1_19500101-21001231.nc?tasmax[7501:1:15000][0:1:91][0:1:206]'])
2021-02-26 13:08:16 DEBUG: Downloading and merging 2 split files
2021-02-26 13:13:57 DEBUG: File writing complete
2021-02-26 13:13:57 INFO: Complete