umr-lops / xsar

Synthetic Aperture Radar (SAR) Level-1 GRD python mapper for efficient xarray/dask based processing
https://cyclobs.ifremer.fr/static/sarwing_datarmor/xsar/
MIT License
29 stars 8 forks source link

Memory leak when processing multiples file on a distributed network #23

Closed oarcher closed 1 day ago

oarcher commented 2 years ago

This only occur when using the network to access worker, or using protocol='tcp://' kw in LocalCluster

Both memory of master and worker increase.

Step to reproduce (with a sequential loop):

cluster = LocalCluster(protocol='tcp://')   # ok with protocol='inmem://'
client = Client(cluster)

for safe in safes:
    ds = xsar.open_dataset(safe)
    ds.lon.compute()
    gc.collect()
    print(process.memory_info().rss / (1024 ** 2))   # increase at each loop

see also 'unmanaged memory' increasing in dask dashboard

oarcher commented 2 years ago

The problem seems to occur in xsar.Sentinel1Dataset, when the function passed to map_blocks is a xsar.Sentinel1Meta instance method (ie self.s1meta.coords2ll)

Using weakref, or copy.deepcopy doesn't solve the problem.

One workaround is to instantiate s1meta on the worker, no there is no serialization of the s1meta object.

agrouaze commented 2 years ago

update of this issue after release v0.9: opening SLC products in a for loop does not trigger any memory leak because map_blocks are not called by default. opening GRD products in a for loop does trigger a memory leak of about +4Mo per IW products.

start:   0%|                                                                                                                                                                           | 0/61 [00:00<?, ?it/s]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T070300_20210101T070325_035941_0435B2_2324.SAFE

iteration #0, memory=633.608 Mo:   2%|██▏                                                                                                                                      | 1/61 [00:02<02:42,  2.70s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T161521_20210101T161549_035946_0435E0_B752.SAFE

iteration #1, memory=639.136 Mo:   3%|████▍                                                                                                                                    | 2/61 [00:05<02:42,  2.75s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T022710_20210101T022735_035938_04359C_E276.SAFE

iteration #2, memory=641.072 Mo:   5%|██████▋                                                                                                                                  | 3/61 [00:08<02:38,  2.74s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T181301_20210101T181325_035948_0435EC_FCA3.SAFE

iteration #3, memory=645.044 Mo:   7%|████████▉                                                                                                                                | 4/61 [00:10<02:34,  2.71s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T181521_20210101T181546_035948_0435ED_3859.SAFE

iteration #4, memory=649.328 Mo:   8%|███████████▏                                                                                                                             | 5/61 [00:13<02:30,  2.68s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDH_20210101T195743_20210101T195808_035949_0435F8_B6F8.SAFE

iteration #5, memory=655.236 Mo:  10%|█████████████▍                                                                                                                           | 6/61 [00:16<02:26,  2.66s/it]
...
agrouaze commented 1 day ago

new analysis:

...
13/11/2024 11:51:53 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
13/11/2024 11:54:14 INFO test_possible_memory_leak.py(26) RAM : 19.09 Go
  5%|████████▌                                                                                                                                                                 | 1/20 [02:21<44:47, 141.45s/it]13/11/2024 11:54:14 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
13/11/2024 11:58:31 INFO test_possible_memory_leak.py(26) RAM : 25.28 Go
 10%|████████████████▊                                                                                                                                                       | 2/20 [06:38<1:02:51, 209.52s/it]13/11/2024 11:58:31 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
13/11/2024 12:00:43 INFO test_possible_memory_leak.py(26) RAM : 25.28 Go
 15%|█████████████████████████▌                                                                                                                                                | 3/20 [08:50<49:17, 173.97s/it]13/11/2024 12:00:43 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
13/11/2024 12:03:38 INFO test_possible_memory_leak.py(26) RAM : 25.28 Go
 20%|██████████████████████████████████                                                                                                                                        | 4/20 [11:45<46:30, 174.43s/it]13/11/2024 12:03:38 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
13/11/2024 12:05:35 INFO test_possible_memory_leak.py(26) RAM : 25.28 Go
 25%|██████████████████████████████████████████▌                                                                                                                               | 5/20 [13:41<38:22, 153.51s/it]13/11/2024 12:05:35 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
13/11/2024 12:08:10 INFO test_possible_memory_leak.py(26) RAM : 25.28 Go
 30%|███████████████████████████████████████████████████                                                                                                                       | 6/20 [16:17<35:58, 154.20s/it]13/11/2024 12:08:10 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
13/11/2024 12:10:05 INFO test_possible_memory_leak.py(26) RAM : 25.28 Go
 35%|███████████████████████████████████████████████████████████▍                                                                                                              | 7/20 [18:12<30:37, 141.32s/it]13/11/2024 12:10:05 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
13/11/2024 12:11:40 INFO test_possible_memory_leak.py(26) RAM : 25.28 Go
 40%|████████████████████████████████████████████████████████████████████                                                                                                      | 8/20 [19:46<25:17, 126.45s/it]13/11/2024 12:11:40 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
13/11/2024 12:12:41 INFO test_possible_memory_leak.py(26) RAM : 25.28 Go
 45%|████████████████████████████████████████████████████████████████████████████▌                                                                                             | 9/20 [20:47<19:25, 105.96s/it]13/11/2024 12:12:41 INFO utils.py(479) BlockingActorProxy: Transparent proxy for Sentinel1Meta
/opt/conda-envs/microdev/lib/python3.12/site-packages/numpy/_core/numeric.py:452: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
...

It appears that the memory is not growing at each iteration: no visible memory leak. I close this issue.