pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.06k stars 292 forks source link

Too much memory usage for composite processing #2764

Open akasom89 opened 6 months ago

akasom89 commented 6 months ago

Describe the bug for creating composite products (even when I ignore atmospheric correction) out of ABI imagery, peak memory usage exceeds 30 GB. I suspect something may be going wrong, as it also takes over 8 minutes. Are there any best practices for increasing the speed? For example, should we tweak parameters such as chunk size to find the optimum? Additionally, can we implement caching or pre-compute certain data (as the field of view of ABI is fixed) to increase the speed for subsequent runs?

To Reproduce

scn.load(scn.available_dataset_names()) scn_resmp = scn.resample(destination=dst_area_def, radius_of_influence=50000) composite = 'true_color_raw' scn_resmp.load([composite]) dataset = scn_resmp[composite]

plt.figure() img = get_enhanced_image(dataset) img_data = img.data img_data.plot.imshow(vmin=0, vmax=1, rgb='bands') img_data.plot.imshow(rgb='bands')

Expected behavior As the file size are much less and using dask, I expected it be executed much smoother(in usual 8 or 16 Gb ram system) and faster(in less than 2-3 mins).

Actual results During visualization, I encounter too many of these warnings. I'm unsure of how much they are related to the performance issue. lib\site-packages\dask\core.py:119: RuntimeWarning: invalid value encountered in cos return func(*(_execute_task(a, cache) for a in args))

Environment Info:

pnuu commented 6 months ago

First thing: do all the loading in the first Scene object. So scn.load([composite]). Loading all the available datasets is unnecessary, and you actually end up resampling them all, too.

Things to try:

djhoese commented 6 months ago

More details on performance frequently asked questions:

https://satpy.readthedocs.io/en/stable/faq.html

I agree with everything Panu said, but additionally want to point out that if your destination/target area definition for resampling is in the satellite's native projection then there are other options besides resampling with nearest neighbor or gradient search that would likely be faster.

Otherwise, how does your example script compare with what you are actually doing? You have two imshow calls in your code if I'm seeing things correctly. Why is that? When do you notice the large memory usage? Is it a peak memory usage of 30GB or is that the memory usage you see once the plot is displayed? My guess is that a majority of your memory usage is from the plotting and not from Satpy directly. If you saved the data to disk with a dask-friendly writer like "geotiff" then my guess is your processing would be much faster and not take up nearly as much memory, especially after chunk size and number of workers is tweaked.