pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.06k stars 292 forks source link

'defusedxml' missing in "msi_safe" extras #2759

Closed fwfichtner closed 6 months ago

fwfichtner commented 6 months ago

Describe the bug

When installing with extra like 'pip install satpy[msi_safe]' the dependency defusedxml is missing here.

To Reproduce

files = find_files_and_readers(base_dir=some_base_dir, reader="msi_safe")

Expected behavior ModuleNotFoundError should not be raised.

Actual results

[DEBUG: 2024-03-13 09:39:52 : satpy.readers.yaml_reader] Reading ('/opt/homebrew/Caskroom/miniforge/base/envs/test_satpy/lib/python3.12/site-packages/satpy/etc/readers/msi_safe.yaml',)
[INFO: 2024-03-13 09:39:52 : satpy.readers] Cannot use ['/opt/homebrew/Caskroom/miniforge/base/envs/test_satpy/lib/python3.12/site-packages/satpy/etc/readers/msi_safe.yaml']
[DEBUG: 2024-03-13 09:39:52 : satpy.readers] while constructing a Python object
cannot find module 'satpy.readers.msi_safe' (No module named 'defusedxml')
  in "/opt/homebrew/Caskroom/miniforge/base/envs/test_satpy/lib/python3.12/site-packages/satpy/etc/readers/msi_safe.yaml", line 14, column 22
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/test_satpy/lib/python3.12/site-packages/yaml/constructor.py", line 551, in find_python_name
    __import__(module_name)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/test_satpy/lib/python3.12/site-packages/satpy/readers/msi_safe.py", line 40, in <module>
    import defusedxml.ElementTree as ET
ModuleNotFoundError: No module named 'defusedxml'

...

Environment Info:

>>> from satpy.utils import check_satpy; check_satpy()
Readers
=======
abi_l1b:  ok
abi_l1b_scmi:  ok
abi_l2_nc:  ok
acspo:  ok
agri_fy4a_l1:  ok
agri_fy4b_l1:  ok
ahi_hrit:  ok
ahi_hsd:  ok
ahi_l1b_gridded_bin:  ok
ahi_l2_nc:  ok
ami_l1b:  ok
amsr2_l1b:  ok
amsr2_l2:  ok
amsr2_l2_gaasp:  ok
amsub_l1c_aapp:  ok
ascat_l2_soilmoisture_bufr:  cannot find module 'satpy.readers.ascat_l2_soilmoisture_bufr' (('Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes.\n           Error: ', ModuleNotFoundError("No module named 'eccodes'")))
atms_l1b_nc:  ok
atms_sdr_hdf5:  ok
avhrr_l1b_aapp:  ok
avhrr_l1b_eps:  cannot find module 'satpy.readers.eps_l1b' (No module named 'defusedxml')
avhrr_l1b_gaclac:  cannot find module 'satpy.readers.avhrr_l1b_gaclac' (No module named 'pygac')
avhrr_l1b_hrpt:  ok
avhrr_l1c_eum_gac_fdr_nc:  ok
caliop_l2_cloud:  cannot find module 'satpy.readers.caliop_l2_cloud' (No module named 'pyhdf')
clavrx:  cannot find module 'satpy.readers.clavrx' (No module named 'pyhdf')
cmsaf-claas2_l2_nc:  ok
electrol_hrit:  ok
epic_l1b_h5:  ok
fci_l1c_nc:  ok
fci_l2_nc:  ok
generic_image:  ok
geocat:  ok
gerb_l2_hr_h5:  ok
ghi_l1:  ok
ghrsst_l2:  ok
glm_l2:  ok
gms5-vissr_l1b:  cannot find module 'satpy.readers.gms.gms5_vissr_l1b' (No module named 'numba')
goes-imager_hrit:  ok
goes-imager_nc:  ok
gpm_imerg:  ok
grib:  cannot find module 'satpy.readers.grib' (No module named 'pygrib')
hsaf_grib:  cannot find module 'satpy.readers.hsaf_grib' (No module named 'pygrib')
hsaf_h5:  ok
hy2_scat_l2b_h5:  ok
iasi_l2:  ok
iasi_l2_cdr_nc:  ok
iasi_l2_so2_bufr:  cannot find module 'satpy.readers.iasi_l2_so2_bufr' (('Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes.\n           Error: ', ModuleNotFoundError("No module named 'eccodes'")))
ici_l1b_nc:  ok
insat3d_img_l1b_h5:  cannot find module 'satpy.readers.insat3d_img_l1b_h5' (No module named 'datatree' It can be installed with the xarray-datatree package.)
jami_hrit:  ok
li_l2_nc:  ok
maia:  ok
meris_nc_sen3:  ok
mersi2_l1b:  ok
mersi_ll_l1b:  ok
mersi_rm_l1b:  ok
mhs_l1c_aapp:  ok
mimicTPW2_comp:  ok
mirs:  ok
modis_l1b:  cannot find module 'satpy.readers.modis_l1b' (No module named 'pyhdf')
modis_l2:  cannot find module 'satpy.readers.modis_l2' (No module named 'pyhdf')
modis_l3:  cannot find module 'satpy.readers.modis_l3' (No module named 'pyhdf')
msi_safe:  cannot find module 'satpy.readers.msi_safe' (No module named 'defusedxml')
msu_gsa_l1b:  ok
mtsat2-imager_hrit:  ok
mviri_l1b_fiduceo_nc:  ok
mwi_l1b_nc:  ok
mws_l1b_nc:  ok
nucaps:  ok
nwcsaf-geo:  ok
nwcsaf-msg2013-hdf5:  ok
nwcsaf-pps_nc:  ok
oceancolorcci_l3_nc:  ok
olci_l1b:  ok
olci_l2:  ok
omps_edr:  ok
osisaf_nc:  ok
safe_sar_l2_ocn:  ok
sar-c_safe:  cannot find module 'satpy.readers.sar_c_safe' (No module named 'defusedxml')
satpy_cf_nc:  ok
scatsat1_l2b:  cannot find module 'satpy.readers.scatsat1_l2b' (cannot import name 'Dataset' from 'satpy.dataset' (/opt/homebrew/Caskroom/miniforge/base/envs/test_satpy/lib/python3.12/site-packages/satpy/dataset/__init__.py))
seadas_l2:  cannot find module 'satpy.readers.seadas_l2' (No module named 'pyhdf')
seviri_l1b_hrit:  ok
seviri_l1b_icare:  cannot find module 'satpy.readers.seviri_l1b_icare' (No module named 'pyhdf')
seviri_l1b_native:  ok
seviri_l1b_nc:  ok
seviri_l2_bufr:  cannot find module 'satpy.readers.seviri_l2_bufr' (Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes)
seviri_l2_grib:  cannot find module 'satpy.readers.seviri_l2_grib' (Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes)
sgli_l1b:  ok
slstr_l1b:  ok
smos_l2_wind:  ok
tropomi_l2:  ok
vaisala_gld360:  ok
vii_l1b_nc:  ok
vii_l2_nc:  ok
viirs_compact:  ok
viirs_edr:  ok
viirs_edr_active_fires:  ok
viirs_edr_flood:  cannot find module 'satpy.readers.viirs_edr_flood' (No module named 'pyhdf')
viirs_l1b:  ok
viirs_sdr:  ok
viirs_vgac_l1c_nc:  ok
virr_l1b:  ok

Writers
=======
awips_tiled:  ok
cf:  ok
geotiff:  ok
mitiff:  ok
ninjogeotiff:  ok
ninjotiff:  cannot find module 'satpy.writers.ninjotiff' (No module named 'pyninjotiff')
simple_image:  ok

Extras
======
cartopy:  No module named 'cartopy'
geoviews:  No module named 'geoviews'

Additional context I noticed that also other dependencies are missing for some Readers, like often pyhdf. Is this intentional?

djhoese commented 6 months ago

I believe this was discussed a couple months ago on the mailing list, but no pull request was every created related to it. I've assigned @mraspaud who I think originally made the change from the builtin lxml to the (more secure) defusedxml package as a dependency.

A PR updating setup.py would be appreciated.

I noticed that also other dependencies are missing for some Readers, like often pyhdf. Is this intentional?

Depending on what you mean by this, the bottom line is that we just can't possibly expect users to install all dependencies for all readers. We would never want to require that and as new readers are added, new dependencies would be added too. Normally users are only interested in using 1 or 2 readers and a majority of our readers are either custom binary formats (using numpy to load the files), netcdf4 files, or hdf5 files. So installing just a couple packages can get users functionality for most readers. This is why netcdf4-python and h5py are hard requirements on the conda package from conda-forge. These dependencies are harder to install from pip so we don't include them as hard dependencies on the PyPI/pip package.

fwfichtner commented 6 months ago

Alright, regarding I just referred to this log lines:

modis_l1b:  cannot find module 'satpy.readers.modis_l1b' (No module named 'pyhdf')
modis_l2:  cannot find module 'satpy.readers.modis_l2' (No module named 'pyhdf')
modis_l3:  cannot find module 'satpy.readers.modis_l3' (No module named 'pyhdf')
msi_safe:  cannot find module 'satpy.readers.msi_safe' (No module named 'defusedxml')

I am happy to create a PR to add defusedxml tomorrow.

djhoese commented 6 months ago

Right. Those readers are all MODIS readers (obviously) and some of only a handful that requires pyhdf (HDF4 is not a file format that is used for new satellite data files). It would be annoying for a lot of users if we required you to install pyhdf to use the other 90% of readers when they won't use it and it depends on the C-level HDF4 library which is not fun to install.

fwfichtner commented 6 months ago

Makes sense, thanks for the explanation.