umr-lops / xsar

Synthetic Aperture Radar (SAR) Level-1 GRD python mapper for efficient xarray/dask based processing
https://cyclobs.ifremer.fr/static/sarwing_datarmor/xsar/
MIT License
25 stars 8 forks source link

Memory leak when processing multiples file on a distributed network #23

Open oarcher opened 2 years ago

oarcher commented 2 years ago

This only occur when using the network to access worker, or using protocol='tcp://' kw in LocalCluster

Both memory of master and worker increase.

Step to reproduce (with a sequential loop):

cluster = LocalCluster(protocol='tcp://')   # ok with protocol='inmem://'
client = Client(cluster)

for safe in safes:
    ds = xsar.open_dataset(safe)
    ds.lon.compute()
    gc.collect()
    print(process.memory_info().rss / (1024 ** 2))   # increase at each loop

see also 'unmanaged memory' increasing in dask dashboard

oarcher commented 2 years ago

The problem seems to occur in xsar.Sentinel1Dataset, when the function passed to map_blocks is a xsar.Sentinel1Meta instance method (ie self.s1meta.coords2ll)

Using weakref, or copy.deepcopy doesn't solve the problem.

One workaround is to instantiate s1meta on the worker, no there is no serialization of the s1meta object.

agrouaze commented 1 year ago

update of this issue after release v0.9: opening SLC products in a for loop does not trigger any memory leak because map_blocks are not called by default. opening GRD products in a for loop does trigger a memory leak of about +4Mo per IW products.

start:   0%|                                                                                                                                                                           | 0/61 [00:00<?, ?it/s]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T070300_20210101T070325_035941_0435B2_2324.SAFE

iteration #0, memory=633.608 Mo:   2%|██▏                                                                                                                                      | 1/61 [00:02<02:42,  2.70s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T161521_20210101T161549_035946_0435E0_B752.SAFE

iteration #1, memory=639.136 Mo:   3%|████▍                                                                                                                                    | 2/61 [00:05<02:42,  2.75s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T022710_20210101T022735_035938_04359C_E276.SAFE

iteration #2, memory=641.072 Mo:   5%|██████▋                                                                                                                                  | 3/61 [00:08<02:38,  2.74s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T181301_20210101T181325_035948_0435EC_FCA3.SAFE

iteration #3, memory=645.044 Mo:   7%|████████▉                                                                                                                                | 4/61 [00:10<02:34,  2.71s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDV_20210101T181521_20210101T181546_035948_0435ED_3859.SAFE

iteration #4, memory=649.328 Mo:   8%|███████████▏                                                                                                                             | 5/61 [00:13<02:30,  2.68s/it]

/home/datawork-cersat-public/project/mpc-sentinel1/data/esa/sentinel-1a/L1/IW/S1A_IW_GRDH_1S/2021/001/S1A_IW_GRDH_1SDH_20210101T195743_20210101T195808_035949_0435F8_B6F8.SAFE

iteration #5, memory=655.236 Mo:  10%|█████████████▍                                                                                                                           | 6/61 [00:16<02:26,  2.66s/it]
...