LI and FCI readers do not work with dask distributed scheduler

gerritholl commented 2 weeks ago

Describe the bug

When using the dask distributed LocalCluster, computing LI or FCI data gives corrupted data. Attempting to save the datasets to disk fails with several exceptions, with the root cause that a _netCDF4.Variable is not picklable.

This might affect other readers as well.

To Reproduce For FCI:

from dask.distributed import Client, LocalCluster
from glob import glob
from satpy import Scene
from satpy.utils import debug_on; debug_on()
fci_files = glob("W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001*_IDPFI_VAL_20231001*_20231001*_N__C_0054_003[456].nc")

def main():
    cluster = LocalCluster()
    client = Client(cluster)
    sc = Scene(filenames={"fci_l1c_nc": fci_files})
    sc.load(["ir_105"], calibration="counts", pad_data=False)
    print(sc["ir_105"][100, 1000].compute().item())
    sc.save_datasets("simple_image")

if __name__ == "__main__":
    main()

And for LI accumulated products:

from dask.distributed import Client, LocalCluster
from glob import glob
from satpy import Scene
li_files = glob("W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+LI-2-AF*--FD--CHK-BODY---NC4E_C_EUMT_20240613045114_L2PF_OPE_20240613045030_20240613045100_N__T_0030_0002.nc")

def main():
    cluster = LocalCluster()
    client = Client(cluster)
    sc = Scene(filenames={"li_l2_nc": li_files})
    sc.load(["accumulated_flash_area"])
    print(sc["accumulated_flash_area"][2500, 3080].compute())
    sc.save_datasets("simple_image")

if __name__ == "__main__":
    main()

Expected behavior

I expect that the print statements result in the correct value, namely the same value as when I comment out the cluster usage. Furthermore, I expect both scripts to produce a simple image.

Actual results

With the distributed scheduler for FCI:

[DEBUG: 2024-06-13 13:11:24 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-06-13 13:11:25 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-06-13 13:11:25 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-06-13 13:11:25 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-06-13 13:11:25 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-06-13 13:11:25 : satpy.readers.yaml_reader] Reading ('/data/gholl/checkouts/satpy/satpy/etc/readers/fci_l1c_nc.yaml',)
[DEBUG: 2024-06-13 13:11:25 : satpy.readers.yaml_reader] Assigning to fci_l1c_nc: ['/media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090004_IDPFI_VAL_20231001085755_20231001085835_N__C_0054_0034.nc', '/media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090012_IDPFI_VAL_20231001085811_20231001085849_N__C_0054_0035.nc', '/media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090018_IDPFI_VAL_20231001085827_20231001085856_N__C_0054_0036.nc']
[DEBUG: 2024-06-13 13:11:25 : satpy.readers.fci_l1c_nc] Reading: /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090004_IDPFI_VAL_20231001085755_20231001085835_N__C_0054_0034.nc
[DEBUG: 2024-06-13 13:11:25 : satpy.readers.fci_l1c_nc] Start: 2023-10-01 08:50:00
[DEBUG: 2024-06-13 13:11:25 : satpy.readers.fci_l1c_nc] End: 2023-10-01 09:00:00
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Reading: /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090012_IDPFI_VAL_20231001085811_20231001085849_N__C_0054_0035.nc
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Start: 2023-10-01 08:50:00
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] End: 2023-10-01 09:00:00
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Reading: /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090018_IDPFI_VAL_20231001085827_20231001085856_N__C_0054_0036.nc
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Start: 2023-10-01 08:50:00
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] End: 2023-10-01 09:00:00
[DEBUG: 2024-06-13 13:11:26 : satpy.composites.config_loader] Looking for composites config file fci.yaml
[DEBUG: 2024-06-13 13:11:26 : pyorbital.tlefile] Path to the Pyorbital configuration (where e.g. platforms.txt is found): /data/gholl/mambaforge/envs/py312/lib/python3.12/site-packages/pyorbital/etc
[DEBUG: 2024-06-13 13:11:26 : satpy.composites.config_loader] Looking for composites config file visir.yaml
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Reading ir_105 from /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090004_IDPFI_VAL_20231001085755_20231001085835_N__C_0054_0034.nc
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Reading ir_105 from /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090012_IDPFI_VAL_20231001085811_20231001085849_N__C_0054_0035.nc
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Reading ir_105 from /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090018_IDPFI_VAL_20231001085827_20231001085856_N__C_0054_0036.nc
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Channel ir_105 resolution: 5568
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Row/Cols: 139 / 5568
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Calculated area extent: (-5567999.994203018, 3895999.995943777, 5567999.994203017, 3617999.996233209)
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Channel ir_105 resolution: 5568
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Row/Cols: 140 / 5568
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Calculated area extent: (-5567999.994203018, 4175999.995652262, 5567999.994203017, 3895999.995943776)
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Channel ir_105 resolution: 5568
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Row/Cols: 139 / 5568
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Calculated area extent: (-5567999.994203018, 4453999.9953628285, 5567999.994203017, 4175999.995652261)
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.yaml_reader] Requested orientation for Dataset ir_105 is 'native' (default). No flipping is applied.
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Reading ir_105_pixel_quality from /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090004_IDPFI_VAL_20231001085755_20231001085835_N__C_0054_0034.nc
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Reading ir_105_pixel_quality from /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090012_IDPFI_VAL_20231001085811_20231001085849_N__C_0054_0035.nc
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.fci_l1c_nc] Reading ir_105_pixel_quality from /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090018_IDPFI_VAL_20231001085827_20231001085856_N__C_0054_0036.nc
[DEBUG: 2024-06-13 13:11:26 : satpy.readers.yaml_reader] Requested orientation for Dataset None is 'native' (default). No flipping is applied.
[DEBUG: 2024-06-13 13:11:26 : satpy.writers] Reading ['/data/gholl/checkouts/satpy/satpy/etc/writers/simple_image.yaml']
[DEBUG: 2024-06-13 13:11:26 : satpy.writers] No enhancement being applied to dataset
[DEBUG: 2024-06-13 13:11:26 : satpy.writers.simple_image] Saving to image: ir_105_20231001_085000.png
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing BlpImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing BmpImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing BufrStubImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing CurImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing DcxImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing DdsImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing EpsImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing FitsImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing FliImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing FpxImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Image: failed to import FpxImagePlugin: No module named 'olefile'
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing FtexImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing GbrImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing GifImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing GribStubImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing Hdf5StubImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing IcnsImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing IcoImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing ImImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing ImtImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing IptcImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing JpegImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing Jpeg2KImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing McIdasImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing MicImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Image: failed to import MicImagePlugin: No module named 'olefile'
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing MpegImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing MpoImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing MspImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing PalmImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing PcdImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing PcxImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing PdfImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing PixarImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing PngImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing PpmImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing PsdImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing QoiImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing SgiImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing SpiderImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing SunImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing TgaImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing TiffImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing WebPImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing WmfImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing XbmImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing XpmImagePlugin
[DEBUG: 2024-06-13 13:11:26 : PIL.Image] Importing XVThumbImagePlugin
[INFO: 2024-06-13 13:11:26 : satpy.writers] Computing and writing results...
2024-06-13 13:11:26,944 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 44 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f8a6c437110>
 0. full_like-696a7234367a96400427da4ddc3c09fb
 1. invert-c86f30564af6929deacc155ead13f4d1
 2. original-array-e0f4e2816622aec99d8d202747d85b9f
 3. array-e0f4e2816622aec99d8d202747d85b9f
 4. le-d7d994d4bb2dacedd4efe49a96fcdbd7
 5. ge-6525ae48396405fdd51cac3616bc1b5f
 6. and_-b70dd6e75897c6c84093f7a1fa507819
 7. where-55dd48b805817a5efab42d71c5bd8cc2
 8. original-array-7e05d901070c5ceae9e85d9ee69991f6
 9. array-7e05d901070c5ceae9e85d9ee69991f6
 10. le-0cfb672d92259c0a4f66414a0f34cf0e
 11. ge-00cfb616d89fc90f4efafd7a287420f1
 12. and_-cf161ec58d41ada77980ff85dfbf4182
 13. where-910dce532cab82a93d00f64c04fb404d
 14. original-array-92330af6cf7bfca79e5a6116279bac9e
 15. array-92330af6cf7bfca79e5a6116279bac9e
 16. le-ce6a025d03a2f725f94de6d96eb0c140
 17. ge-37880d5c8c615ab2f4a67380e8830bc1
 18. and_-be8954ad7264317b4b0e82597182f16b
 19. where-d73b65825da59584be331d7d8feade6a
 20. concatenate-eb0ea4f18c208f8db3d37fe422cdc388
 21. broadcast_to-7f5b963d05ba33d499efbf900da3354b
 22. getitem-6bde8934bd55ad8109bbabc9fb0a6aa7
 23. ne-b133c489c45f840b5eee637617834f8e
 24. any-15bb9bf9193a46778437db9e43b0bc31
 25. any-aggregate-06570645fc6883d9786108644ab05894
 26. broadcast_to-a07bbc298b7fcd3892f638759b336b74
 27. clip-cf26d7576b4382959563d04f809f545c
 28. mul-d576f52b2a38443f0c9e1c8ab4fb5b7a
 29. add-cee50f65a8dd904280ef521a4e73f325
 30. round-febda8ca0b95edda8d5aa435f049db50
 31. where-3996b77335e9b452415d985190e60c1b
 32. astype-03fdd3b253bf9da74dcd5953d469e252
 33. clip-514b533a12d10b0cfb427d0c5c427bfb
 34. round-adc4823fe757e220b0d282d51cbbda7e
 35. full_like-9ad15347397c7233476b6973843cf60f
 36. invert-5bb4b1de69ff7f5779267d39403501a6
 37. where-ace75b00e9abe1e2db982f759b457b77
 38. astype-5951719beae63373cac280e05293e64c
 39. concatenate-9da4581706091d03c5ea32c5ee24bf7e
 40. transpose-c5b49f55d6ed82f09054cbb10bf22eee
 41. finalize-4c7205b4-bc9d-4530-8ad0-8eae37f87e51
 42. fromarray-2721f204-3bc5-4122-9899-40fc78b569f8
 43. delayed_pil_save-c558dd2e0cd6be31c80780ab4ddf2d0b
>.
Traceback (most recent call last):
  File "/data/gholl/mambaforge/envs/py312/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.PicklingError: Can't pickle <function delayed_pil_save at 0x7f8ab81436a0>: it's not the same object as trollimage.xrimage.delayed_pil_save

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/gholl/mambaforge/envs/py312/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
_pickle.PicklingError: Can't pickle <function delayed_pil_save at 0x7f8ab81436a0>: it's not the same object as trollimage.xrimage.delayed_pil_save

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/gholl/mambaforge/envs/py312/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gholl/mambaforge/envs/py312/lib/python3.12/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/data/gholl/mambaforge/envs/py312/lib/python3.12/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 6061, in netCDF4._netCDF4.Variable.__reduce__
NotImplementedError: Variable is not picklable
43.085777282714844
Traceback (most recent call last):
  File "/data/gholl/checkouts/protocode/test-dask-distributed-fci-2.py", line 27, in <module>
    main()
  File "/data/gholl/checkouts/protocode/test-dask-distributed-fci-2.py", line 23, in main
    sc.save_datasets("simple_image", enhance=False)
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 1293, in save_datasets
    return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gholl/checkouts/satpy/satpy/writers/__init__.py", line 756, in save_datasets
    return compute_writer_results([results])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gholl/checkouts/satpy/satpy/writers/__init__.py", line 594, in compute_writer_results
    da.compute(delayeds)
  File "/data/gholl/mambaforge/envs/py312/lib/python3.12/site-packages/dask/base.py", line 661, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gholl/mambaforge/envs/py312/lib/python3.12/site-packages/distributed/protocol/serialize.py", line 389, in serialize
    raise TypeError(msg, str_x) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 44 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f8a6c437110>\n 0. full_like-696a7234367a96400427da4ddc3c09fb\n 1. invert-c86f30564af6929deacc155ead13f4d1\n 2. original-array-e0f4e2816622aec99d8d202747d85b9f\n 3. array-e0f4e2816622aec99d8d202747d85b9f\n 4. le-d7d994d4bb2dacedd4efe49a96fcdbd7\n 5. ge-6525ae48396405fdd51cac3616bc1b5f\n 6. and_-b70dd6e75897c6c84093f7a1fa507819\n 7. where-55dd48b805817a5efab42d71c5bd8cc2\n 8. original-array-7e05d901070c5ceae9e85d9ee69991f6\n 9. array-7e05d901070c5ceae9e85d9ee69991f6\n 10. le-0cfb672d92259c0a4f66414a0f34cf0e\n 11. ge-00cfb616d89fc90f4efafd7a287420f1\n 12. and_-cf161ec58d41ada77980ff85dfbf4182\n 13. where-910dce532cab82a93d00f64c04fb404d\n 14. original-array-92330af6cf7bfca79e5a6116279bac9e\n 15. array-92330af6cf7bfca79e5a6116279bac9e\n 16. le-ce6a025d03a2f725f94de6d96eb0c140\n 17. ge-37880d5c8c615ab2f4a67380e8830bc1\n 18. and_-be8954ad7264317b4b0e82597182f16b\n 19. where-d73b65825da59584be331d7d8feade6a\n 20. concatenate-eb0ea4f18c208f8db3d37fe422cdc388\n 21. broadcast_to-7f5b963d05ba33d499efbf900da3354b\n 22. getitem-6bde8934bd55ad8109bbabc9fb0a6aa7\n 23. ne-b133c489c45f840b5eee637617834f8e\n 24. any-15bb9bf9193a46778437db9e43b0bc31\n 25. any-aggregate-06570645fc6883d9786108644ab05894\n 26. broadcast_to-a07bbc298b7fcd3892f638759b336b74\n 27. clip-cf26d7576b4382959563d04f809f545c\n 28. mul-d576f52b2a38443f0c9e1c8ab4fb5b7a\n 29. add-cee50f65a8dd904280ef521a4e73f325\n 30. round-febda8ca0b95edda8d5aa435f049db50\n 31. where-3996b77335e9b452415d985190e60c1b\n 32. astype-03fdd3b253bf9da74dcd5953d469e252\n 33. clip-514b533a12d10b0cfb427d0c5c427bfb\n 34. round-adc4823fe757e220b0d282d51cbbda7e\n 35. full_like-9ad15347397c7233476b6973843cf60f\n 36. invert-5bb4b1de69ff7f5779267d39403501a6\n 37. where-ace75b00e9abe1e2db982f759b457b77\n 38. astype-5951719beae63373cac280e05293e64c\n 39. concatenate-9da4581706091d03c5ea32c5ee24bf7e\n 40. transpose-c5b49f55d6ed82f09054cbb10bf22eee\n 41. finalize-4c7205b4-bc9d-4530-8ad0-8eae37f87e51\n 42. fromarray-2721f204-3bc5-4122-9899-40fc78b569f8\n 43. delayed_pil_save-c558dd2e0cd6be31c80780ab4ddf2d0b\n>')

If I comment out the save_datasets call, the code completes with:

[DEBUG: 2024-06-13 13:13:34 : satpy.readers.fci_l1c_nc] Reading ir_105_pixel_quality from /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090004_IDPFI_VAL_20231001085755_20231001085835_N__C_0054_0034.nc
[DEBUG: 2024-06-13 13:13:34 : satpy.readers.fci_l1c_nc] Reading ir_105_pixel_quality from /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090012_IDPFI_VAL_20231001085811_20231001085849_N__C_0054_0035.nc
[DEBUG: 2024-06-13 13:13:34 : satpy.readers.fci_l1c_nc] Reading ir_105_pixel_quality from /media/nas/x23352/MTG/FCI/L1c-cases/202309_10-cwg/10/01/08/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY---NC4E_C_EUMT_20231001090018_IDPFI_VAL_20231001085827_20231001085856_N__C_0054_0036.nc
[DEBUG: 2024-06-13 13:13:34 : satpy.readers.yaml_reader] Requested orientation for Dataset None is 'native' (default). No flipping is applied.
43.085777282714844

The value is wrong (it shouldn't even be a float). For reference, when I use the regular scheduler, I get the (probably correct) value of 1025 (integer)

If I load calibrated data, all data are NaN.

I get similar problems with LI, but not with IASI L2 CDR. All three use file handlers deriving from NetCDF4FsspecFileHandler, but only FCI and LI use cache_handle=True, which seems to trigger the problem. I did not try MWS (which also uses cache_handle=True).

Environment Info:

OS: openSUSE Leap 15.3
Satpy Version: latest main (v0.49.0-55-g367016e29)

Additional context

I ran into this problem when working on #2686.

It would seem that references to the file handler object end up in the dask graph, which should probably be avoided. They fail to be pickled, because they contain references to NetCDF4 objects.

In #1546, a similar problem was solved for VIIRS Compact.

I don't know yet if the hypothesis is correct and if so, how those references end up in the dask graph.

gerritholl commented 2 weeks ago

Both readers use map_blocks, which might be one way in which a filehandler reference accidentally ends up in the graph:

https://github.com/pytroll/satpy/blob/367016e29683fcabb257fa0498ad69ae28b00a42/satpy/readers/fci_l1c_nc.py#L470-L472

and

https://github.com/pytroll/satpy/blob/367016e29683fcabb257fa0498ad69ae28b00a42/satpy/readers/li_base_nc.py#L339-L344

but the FCI code does not use the index map (the code in question is never reached). The LI use case for map_blocks does, but an ad-hoc change to replace it by something else does not appear to solve the problem (output and error message remain the same).

gerritholl commented 2 weeks ago

If I change v to v[:] in da.from_array, thus converting the netCDF4.Variable into numpy.ndarray at

https://github.com/pytroll/satpy/blob/12854de2dc6bb09cb8a90d49d5e15e1aeb92a411/satpy/readers/netcdf_utils.py#L331-L333

then the code completes successfully (but no longer reads lazily).

So maybe it's not the filehandler, but it's the dask graphs themselves that are unpicklable due to containing open NetCDF variables.

gerritholl commented 2 weeks ago

But unpicklable dask objects are still computable:

import netCDF4
import dask.array as da
from dask.distributed import Client, LocalCluster

def main():
    cluster = LocalCluster()
    client = Client(cluster)

    nc = netCDF4.Dataset("/media/nas/x21308/MTG_test_data/LI/x/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+LI-2-AF--FD--CHK-BODY---NC4E_C_EUMT_20240613045114_L2PF_OPE_20240613045030_20240613045100_N__T_0030_0002.nc")
    nc["x"]
    v = da.from_array(nc["x"])
    print(v.compute())

if __name__ == "__main__":
    main()

so the presence of this variable by itself shouldn't prevent the dask graph to be computable.

gerritholl commented 2 weeks ago

dask distributed is supposed to contain a custom pickler class to handle exactly such cases, and/or use cloudpickle which can handle variables not otherwise picklable. This pull request explicitly refers to the data variable case (from h5netcdf, but probably the same from NetCDF4).

gerritholl commented 2 weeks ago

NB: The NetCDF4FileHandler uses xarray to open datasets when cache_handles=False (the default), but netCDF4 when cache_handles=True. Although we cannot pickle a dask array created from a NetCDF4 variable or a h5netcdf variable, we can pickle an xarray.Dataset created from the same, due to additional functionality in xarray. This might explain why we can use dask distributed when cache_handles=False but not when it is set to True.

https://github.com/pytroll/satpy/blob/367016e29683fcabb257fa0498ad69ae28b00a42/satpy/readers/netcdf_utils.py#L72-L78

gerritholl commented 2 weeks ago

Maybe we should use CachingFileManager, but I'm not sure what to replace da.from_array(v) with in this case. Some sort of map_blocks call like in https://github.com/pydata/xarray/issues/4242#issuecomment-2009576810?

pytroll / satpy

LI and FCI readers do not work with dask distributed scheduler #2815