Test failure on i386 and armhf

avalentino commented 3 years ago

Describe the bug There are a couple of test failures on GNU/Linux Debian Sid on i386 architectures.

The fist problem seem to be due to a comparison of floating point numbers which should be easy to fix (I can provide a patch for that if necessary).

The second problem seems to be a little bit more tricky and probably related to dask. Please not that this problem seems to impact, even more seriously, the armhf architecture (also 32bit): https://ci.debian.net/data/autopkgtest/testing/armhf/s/satpy/16585938/log.gz.

The full test log for i386 is available at https://ci.debian.net/data/autopkgtest/testing/i386/s/satpy/16585947/log.gz

To Reproduce

python3 -m pytest

Expected behavior All tests pass.

Actual results

============================= test session starts ==============================
platform linux -- Python 3.9.8, pytest-6.2.5, py-1.10.0, pluggy-0.13.0 -- /usr/bin/python3.9
cachedir: .pytest_cache
rootdir: /tmp/autopkgtest-lxc.6tmzu79q/downtmp/autopkgtest_tmp
plugins: lazy-fixture-0.6.3
collecting ... collected 1359 items / 5 deselected / 2 skipped / 1352 selected

[CUT]

=================================== FAILURES ===================================
___________ Test_HDF_AGRI_L1_cal.test_fy4a_for_one_resolution[2000] ____________

self = <satpy.tests.reader_tests.test_agri_l1.Test_HDF_AGRI_L1_cal object at 0xb5abf7c0>
resolution_to_test = 2000

    @pytest.mark.parametrize("resolution_to_test", RESOLUTION_LIST)
    def test_fy4a_for_one_resolution(self, resolution_to_test):
        """Test loading data when only one resolution is available."""
        reader = self._create_reader_for_resolutions(resolution_to_test)

        available_datasets = reader.available_dataset_ids
        band_names = CHANNELS_BY_RESOLUTION[resolution_to_test]
        self._assert_which_channels_are_loaded(available_datasets, band_names, resolution_to_test)
        res = reader.load(band_names)
        assert len(res) == len(band_names)
        self._check_calibration_and_units(band_names, res)
        for band_name in band_names:
>           assert res[band_name].attrs['area'].area_extent == AREA_EXTENTS_BY_RESOLUTION[resolution_to_test]
E           assert (-5495021.206...021.206414789) == (-5495021.206...021.206414789)
E             At index 1 diff: 5493021.19869635 != 5493021.198696349
E             Full diff:
E             - (-5495021.206414789, 5493021.198696349, -5487021.175541028, 5495021.206414789)
E             ?                                     ^^
E             + (-5495021.206414789, 5493021.19869635, -5487021.175541028, 5495021.206414789)
E             ?                                     ^

/usr/lib/python3/dist-packages/satpy/tests/reader_tests/test_agri_l1.py:362: AssertionError
_ TestModisL1b.test_load_longitude_latitude[modis_l1b_nasa_1km_mod03_files-True-True-True-250] _

self = <satpy.tests.reader_tests.test_modis_l1b.TestModisL1b object at 0xb55d2598>
input_files = ['/tmp/pytest-of-debci/pytest-0/modis_l1b0/MOD021km_A21314_161549_2021314161549.hdf', '/tmp/pytest-of-debci/pytest-0/modis_l1b5/MOD03_A21314_162023_2021314162023.hdf']
has_5km = True, has_500 = True, has_250 = True, default_res = 250

    @pytest.mark.parametrize(
        ('input_files', 'has_5km', 'has_500', 'has_250', 'default_res'),
        [
            [lazy_fixture('modis_l1b_nasa_mod021km_file'),
             True, False, False, 1000],
            [lazy_fixture('modis_l1b_imapp_1000m_file'),
             True, False, False, 1000],
            [lazy_fixture('modis_l1b_nasa_mod02hkm_file'),
             False, True, True, 250],
            [lazy_fixture('modis_l1b_nasa_mod02qkm_file'),
             False, True, True, 250],
            [lazy_fixture('modis_l1b_nasa_1km_mod03_files'),
             True, True, True, 250],
        ]
    )
    def test_load_longitude_latitude(self, input_files, has_5km, has_500, has_250, default_res):
        """Test that longitude and latitude datasets are loaded correctly."""
        scene = Scene(reader='modis_l1b', filenames=input_files)
        shape_5km = _shape_for_resolution(5000)
        shape_500m = _shape_for_resolution(500)
        shape_250m = _shape_for_resolution(250)
        default_shape = _shape_for_resolution(default_res)
        with dask.config.set(scheduler=CustomScheduler(max_computes=1 + has_5km + has_500 + has_250)):
>           _load_and_check_geolocation(scene, "*", default_res, default_shape, True)

/usr/lib/python3/dist-packages/satpy/tests/reader_tests/test_modis_l1b.py:140: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib/python3/dist-packages/satpy/tests/reader_tests/test_modis_l1b.py:56: in _load_and_check_geolocation
    lon_vals, lat_vals = dask.compute(lon_arr, lat_arr)
/usr/lib/python3/dist-packages/dask/base.py:570: in compute
    results = schedule(dsk, keys, **kwargs)
/usr/lib/python3/dist-packages/satpy/tests/utils.py:265: in __call__
    return dask.get(dsk, keys, **kwargs)
/usr/lib/python3/dist-packages/dask/local.py:563: in get_sync
    return get_async(
/usr/lib/python3/dist-packages/dask/local.py:506: in get_async
    for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.9/concurrent/futures/_base.py:438: in result
    return self.__get_result()
/usr/lib/python3.9/concurrent/futures/_base.py:390: in __get_result
    raise self._exception
/usr/lib/python3/dist-packages/dask/local.py:548: in submit
    fut.set_result(fn(*args, **kwargs))
/usr/lib/python3/dist-packages/dask/local.py:237: in batch_execute_tasks
    return [execute_task(*a) for a in it]
/usr/lib/python3/dist-packages/dask/local.py:237: in <listcomp>
    return [execute_task(*a) for a in it]
/usr/lib/python3/dist-packages/dask/local.py:228: in execute_task
    result = pack_exception(e, dumps)
/usr/lib/python3/dist-packages/dask/local.py:223: in execute_task
    result = _execute_task(task, data)
/usr/lib/python3/dist-packages/dask/core.py:121: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
/usr/lib/python3/dist-packages/dask/optimization.py:969: in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/lib/python3/dist-packages/dask/core.py:151: in get
    result = _execute_task(task, cache)
/usr/lib/python3/dist-packages/dask/core.py:121: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
/usr/lib/python3/dist-packages/dask/utils.py:35: in apply
    return func(*args, **kwargs)
<__array_function__ internals>:5: in repeat
    ???
/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py:479: in repeat
    return _wrapfunc(a, 'repeat', repeats, axis=axis)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = array([[-2961361.2, -2960622. , -2959883.2, ..., -2631443.8, -2630711.2,
        -2629978.8],
       [-2961361.2, -296...24772.2],
       [-2967931.2, -2967165.5, -2966399.8, ..., -2626288.2, -2625530.2,
        -2624772.2]], dtype=float32)
method = 'repeat', args = (4,), kwds = {'axis': 1}
bound = <built-in method repeat of numpy.ndarray object at 0xe75570c0>

    def _wrapfunc(obj, method, *args, **kwds):
        bound = getattr(obj, method, None)
        if bound is None:
            return _wrapit(obj, method, *args, **kwds)

        try:
>           return bound(*args, **kwds)
E           numpy.core._exceptions._ArrayMemoryError: Unable to allocate 11.0 MiB for an array with shape (1600, 1804) and data type float32

/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py:58: MemoryError
=============================== warnings summary ===============================
[CUT]

=========================== short test summary info ============================
FAILED tests/reader_tests/test_agri_l1.py::Test_HDF_AGRI_L1_cal::test_fy4a_for_one_resolution[2000]
FAILED tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_load_longitude_latitude[modis_l1b_nasa_1km_mod03_files-True-True-True-250]
= 2 failed, 1340 passed, 10 skipped, 5 deselected, 4 xfailed, 579 warnings in 537.84s (0:08:57) =

Environment Info:

OS: GNU/Linux debian sid i386
Satpy Version: 0.31.0
PyResample Version: 1.22.0
Readers and writers dependencies (when relevant): [run from satpy.utils import check_satpy; check_satpy()]

Readers
=======
abi_l1b:  ok
abi_l1b_scmi:  ok
abi_l2_nc:  ok
acspo:  ok
agri_l1:  ok
ahi_hrit:  ok
ahi_hsd:  ok
ahi_l1b_gridded_bin:  ok
ami_l1b:  ok
amsr2_l1b:  ok
amsr2_l2:  ok
amsr2_l2_gaasp:  ok
ascat_l2_soilmoisture_bufr:  ok
avhrr_l1b_aapp:  ok
avhrr_l1b_eps:  ok
avhrr_l1b_gaclac:  ok
avhrr_l1b_hrpt:  ok
avhrr_l1c_eum_gac_fdr_nc:  ok
caliop_l2_cloud:  cannot find module 'satpy.readers.caliop_l2_cloud' (cannot import name 'Dataset' from 'satpy.dataset' (/home/antonio/debian/git/satpy/satpy/dataset/__init__.py))
clavrx:  ok
cmsaf-claas2_l2_nc:  ok
electrol_hrit:  ok
fci_l1c_nc:  ok
fci_l2_nc:  ok
generic_image:  ok
geocat:  ok
ghrsst_l3c_sst:  cannot find module 'satpy.readers.ghrsst_l3c_sst' (cannot import name 'Dataset' from 'satpy.dataset' (/home/antonio/debian/git/satpy/satpy/dataset/__init__.py))
glm_l2:  ok
goes-imager_hrit:  ok
goes-imager_nc:  ok
gpm_imerg:  ok
grib:  ok
hsaf_grib:  ok
hy2_scat_l2b_h5:  ok
iasi_l2:  ok
iasi_l2_so2_bufr:  ok
jami_hrit:  ok
li_l2:  cannot find module 'satpy.readers.li_l2' (cannot import name 'Dataset' from 'satpy.dataset' (/home/antonio/debian/git/satpy/satpy/dataset/__init__.py))
maia:  ok
mersi2_l1b:  ok
mimicTPW2_comp:  ok
mirs:  ok
modis_l1b:  ok
modis_l2:  ok
msi_safe:  cannot find module 'satpy.readers.msi_safe' (No module named 'rioxarray')
mtsat2-imager_hrit:  ok
mviri_l1b_fiduceo_nc:  ok
nucaps:  ok
nwcsaf-geo:  ok
nwcsaf-msg2013-hdf5:  ok
nwcsaf-pps_nc:  ok
olci_l1b:  ok
olci_l2:  ok
omps_edr:  ok
safe_sar_l2_ocn:  ok
sar-c_safe:  cannot find module 'satpy.readers.sar_c_safe' (No module named 'rioxarray')
satpy_cf_nc:  ok
scatsat1_l2b:  cannot find module 'satpy.readers.scatsat1_l2b' (cannot import name 'Dataset' from 'satpy.dataset' (/home/antonio/debian/git/satpy/satpy/dataset/__init__.py))
seviri_l1b_hrit:  ok
seviri_l1b_icare:  ok
seviri_l1b_native:  ok
seviri_l1b_nc:  ok
seviri_l2_bufr:  ok
seviri_l2_grib:  ok
slstr_l1b:  ok
slstr_l2:  ok
smos_l2_wind:  ok
tropomi_l2:  ok
vaisala_gld360:  ok
vii_l1b_nc:  ok
vii_l2_nc:  ok
viirs_compact:  ok
viirs_edr_active_fires:  ok
viirs_edr_flood:  ok
viirs_l1b:  ok
viirs_sdr:  ok
virr_l1b:  ok

Writers
=======
/usr/lib/python3/dist-packages/pyninjotiff/tifffile.py:154: UserWarning: failed to import the optional _tifffile C extension module.
Loading of some compressed images will be slow.
Tifffile.c can be obtained at http://www.lfd.uci.edu/~gohlke/
  warnings.warn(
awips_tiled:  ok
cf:  ok
geotiff:  ok
mitiff:  ok
ninjogeotiff:  ok
ninjotiff:  ok
simple_image:  ok

Extras
======
cartopy:  ok
geoviews:  No module named 'geoviews'

djhoese commented 3 years ago

That second error sounds to me like the test system is running out of memory. One thing that could be attempted, maybe, is limiting the number of dask workers while running the tests. This would reduce the amount of threads running at once which would therefore reduce the amount of data in memory. It can be set with an environment variable export DASK_NUM_WORKERS=1 or from within python for the current process with dask.config.set(num_workers=1). It defaults to the number of logical CPU cores on the system. See https://satpy.readthedocs.io/en/latest/faq.html#why-is-satpy-slow-on-my-powerful-machine and maybe the later section talking about using OMP_NUM_THREADS to further reduce the threads used by numpy.

djhoese commented 3 years ago

Note: I'm not sure this will help even if it does help initially as the line it is failing on is purposely computing the full arrays for 250m MODIS longitude and latitude datasets. What is being tested is the accuracy of the interpolation (if I remember correctly) and that the correct interpolation function is being used. That is only really accurate if the data that is returned is the same shape as what it would be in a real world case. So if the system running these tests can't hold that much data then I'm not sure there is much we can do.

avalentino commented 2 years ago

Thanks for the feedback @djhoese. Unfortunately using export DASK_NUM_WORKERS=1 does not help. Anyway if you thing that the problem is related to the lack of resources IMHO the test can be safely disabled on that platform in the Debian builds.

We have similar errors also on ARM (https://ci.debian.net/data/autopkgtest/testing/armhf/s/satpy/16585938/log.gz). In this case the number of failures is by far larger, 61.

FAILED tests/test_scene.py::TestScene::test_crop - numpy.core._exceptions._Ar...
FAILED tests/test_scene.py::TestScene::test_crop_epsg_crs - numpy.core._excep...
FAILED tests/test_scene.py::TestScene::test_crop_rgb - numpy.core._exceptions...
FAILED tests/test_scene.py::TestSceneAggregation::test_aggregate - numpy.core...
FAILED tests/test_scene.py::TestSceneAggregation::test_aggregate_with_boundary
FAILED tests/reader_tests/test_mimic_TPW2_nc.py::TestMimicTPW2Reader::test_load_mimic
FAILED tests/reader_tests/test_modis_l2.py::TestModisL2::test_load_l2_dataset[modis_l2_imapp_snowmask_file-loadables1-1000-False]
FAILED tests/reader_tests/test_nwcsaf_msg.py::TestH5NWCSAF::test_get_dataset
FAILED tests/reader_tests/test_seviri_l1b_hrit.py::TestHRITMSGFileHandlerHRV::test_get_dataset
FAILED tests/reader_tests/test_seviri_l1b_hrit.py::TestHRITMSGFileHandlerHRV::test_get_dataset_non_fill
FAILED tests/reader_tests/test_seviri_l1b_hrit.py::TestHRITMSGFileHandler::test_get_dataset
FAILED tests/reader_tests/test_seviri_l1b_hrit.py::TestHRITMSGFileHandler::test_get_dataset_with_raw_metadata
FAILED tests/reader_tests/test_seviri_l2_grib.py::Test_SeviriL2GribFileHandler::test_data_reading
FAILED tests/reader_tests/test_smos_l2_wind.py::TestSMOSL2WINDReader::test_adjust_lon
FAILED tests/reader_tests/test_smos_l2_wind.py::TestSMOSL2WINDReader::test_init
FAILED tests/reader_tests/test_smos_l2_wind.py::TestSMOSL2WINDReader::test_load_lat
FAILED tests/reader_tests/test_smos_l2_wind.py::TestSMOSL2WINDReader::test_load_lon
FAILED tests/reader_tests/test_smos_l2_wind.py::TestSMOSL2WINDReader::test_load_wind_speed
FAILED tests/reader_tests/test_smos_l2_wind.py::TestSMOSL2WINDReader::test_roll_dataset
FAILED tests/reader_tests/test_tropomi_l2.py::TestTROPOMIL2Reader::test_load_bounds
FAILED tests/reader_tests/test_tropomi_l2.py::TestTROPOMIL2Reader::test_load_no2
FAILED tests/reader_tests/test_tropomi_l2.py::TestTROPOMIL2Reader::test_load_so2
FAILED tests/reader_tests/test_viirs_compact.py::TestCompact::test_distributed
FAILED tests/reader_tests/test_viirs_compact.py::TestCompact::test_get_dataset
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_all_i_radiances
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_all_i_reflectances_provided_geo
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_all_m_bts
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_all_m_radiances
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_all_m_reflectances_find_geo
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_all_m_reflectances_no_geo
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_all_m_reflectances_provided_geo
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_all_m_reflectances_use_nontc
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_all_m_reflectances_use_nontc2
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_dnb
FAILED tests/reader_tests/test_viirs_sdr.py::TestVIIRSSDRReader::test_load_dnb_no_factors
FAILED tests/reader_tests/test_viirs_sdr.py::TestAggrVIIRSSDRReader::test_bounding_box
FAILED tests/reader_tests/test_viirs_sdr.py::TestShortAggrVIIRSSDRReader::test_load_truncated_band
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_basic_lettered_tiles
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_basic_lettered_tiles_diff_projection
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_lettered_tiles_update_existing
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_lettered_tiles_sector_ref
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_lettered_tiles_no_fit
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_lettered_tiles_no_valid_data
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_lettered_tiles_bad_filename
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_basic_numbered_tiles_rgb
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_multivar_numbered_tiles_glm[extra_kwargs0-C]
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_multivar_numbered_tiles_glm[extra_kwargs0-F]
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_multivar_numbered_tiles_glm[extra_kwargs1-C]
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_multivar_numbered_tiles_glm[extra_kwargs1-F]
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_multivar_numbered_tiles_glm[extra_kwargs2-C]
FAILED tests/writer_tests/test_awips_tiled.py::TestAWIPSTiledWriter::test_multivar_numbered_tiles_glm[extra_kwargs2-F]
FAILED tests/writer_tests/test_mitiff.py::TestMITIFFWriter::test_get_test_dataset_three_bands_prereq
FAILED tests/writer_tests/test_mitiff.py::TestMITIFFWriter::test_save_dataset_with_bad_value
FAILED tests/writer_tests/test_mitiff.py::TestMITIFFWriter::test_save_dataset_with_calibration
FAILED tests/writer_tests/test_mitiff.py::TestMITIFFWriter::test_save_one_dataset
FAILED tests/writer_tests/test_mitiff.py::TestMITIFFWriter::test_save_one_dataset_sesnor_set
FAILED tests/writer_tests/test_mitiff.py::TestMITIFFWriter::test_simple_write
FAILED tests/writer_tests/test_mitiff.py::TestMITIFFWriter::test_simple_write_two_bands
FAILED tests/writer_tests/test_ninjogeotiff.py::test_write_and_read_file_RGB
FAILED tests/writer_tests/test_ninjogeotiff.py::test_get_min_gray_value_RGB
FAILED tests/writer_tests/test_ninjogeotiff.py::test_get_max_gray_value_RGB
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_scene_available_datasets[modis_l1b_nasa_mod021km_file-expected_names0-expected_data_res0-expected_geo_res0]
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_scene_available_datasets[modis_l1b_imapp_1000m_file-expected_names1-expected_data_res1-expected_geo_res1]
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_scene_available_datasets[modis_l1b_nasa_mod02hkm_file-expected_names2-expected_data_res2-expected_geo_res2]
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_scene_available_datasets[modis_l1b_nasa_mod02qkm_file-expected_names3-expected_data_res3-expected_geo_res3]
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_load_longitude_latitude[modis_l1b_nasa_mod021km_file-True-False-False-1000]
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_load_longitude_latitude[modis_l1b_imapp_1000m_file-True-False-False-1000]
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_load_longitude_latitude[modis_l1b_nasa_mod02hkm_file-False-True-True-250]
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_load_longitude_latitude[modis_l1b_nasa_mod02qkm_file-False-True-True-250]
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_load_longitude_latitude[modis_l1b_nasa_1km_mod03_files-True-True-True-250]
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_load_sat_zenith_angle
ERROR tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_load_vis - num...
ERROR tests/reader_tests/test_modis_l2.py::TestModisL2::test_scene_available_datasets
ERROR tests/reader_tests/test_modis_l2.py::TestModisL2::test_load_longitude_latitude[modis_l2_nasa_mod35_file-True-False-False-1000]
ERROR tests/reader_tests/test_modis_l2.py::TestModisL2::test_load_quality_assurance
ERROR tests/reader_tests/test_modis_l2.py::TestModisL2::test_load_category_dataset[modis_l2_nasa_mod35_mod03_files-loadables0-1000-1000-True]
ERROR tests/reader_tests/test_modis_l2.py::TestModisL2::test_load_category_dataset[modis_l2_imapp_mask_byte1_geo_files-loadables1-None-1000-True]
ERROR tests/reader_tests/test_modis_l2.py::TestModisL2::test_load_250m_cloud_mask_dataset[modis_l2_nasa_mod35_file-False]
ERROR tests/reader_tests/test_modis_l2.py::TestModisL2::test_load_250m_cloud_mask_dataset[modis_l2_nasa_mod35_mod03_files-True]
ERROR tests/reader_tests/test_modis_l2.py::TestModisL2::test_load_l2_dataset[modis_l2_imapp_snowmask_geo_files-loadables2-1000-True]

Do you have an idea if also in the above cases the problem can be connected to the lack of resources (memory) on the test machine?

In case, could it make sense to mark that kind of tests so that they can be quickly excluded on machines that have limited memory?

djhoese commented 2 years ago

Darn. This is a difficult problem. Do you have any idea how much memory is available on the system this is running on?

So the big change that seems to be happening here is that I made pytest "fixtures" for the MODIS tests that create fake data for the MODIS tests. These fixtures are currently "scoped" to the entire test session so they should only be created once. My guess is that they are being created and kept in memory and any test following the modis tests that uses a decent amount of memory is hitting the limits because all that MODIS data is sitting in memory from the fixtures.

~One option would be to patch https://github.com/pytroll/satpy/blob/main/satpy/tests/reader_tests/_modis_fixtures.py and change all scope="session" usages to scope="class" (or even "function") and see if the tests behave differently. This should make the tests run longer because they'll have to create the fixtures multiple times. ...~

Hold on, these fixtures are creating files in the temporary directory. These should not be holding on to memory. @avalentino is there any chance that these tests are running with TMPDIR (the default temporary directory) set to /dev/shm or some similar in-memory volume? Otherwise, I don't really have any guess why these MODIS tests are running out of memory.

Until we/I can figure out what these tests have in common besides creating large arrays, I don't think we can easily mark them for skipping. @mraspaud this does bring up a point that I think we've discussed in the past along the lines of marking certain tests as "long" for long running, or maybe "high_mem" for needing a lot of memory, or other similar resource related dependencies. Maybe even a batch of tests that require AWS S3 ABI or AHI data but shouldn't be run by default.

avalentino commented 2 years ago

OK, I repeated the test on ARM excluding only test_modis_l1b:

$ python3 -m pytest -k "not modis_l1b"

The result is that now only one test fails:

================================================================= FAILURES =================================================================
___________________________________________________ TestMimicTPW2Reader.test_load_mimic ____________________________________________________

self = <satpy.tests.reader_tests.test_mimic_TPW2_nc.TestMimicTPW2Reader testMethod=test_load_mimic>

>   ???

/home/antonio/debian/git/satpy/satpy/tests/reader_tests/test_mimic_TPW2_nc.py:126: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
satpy/readers/yaml_reader.py:943: in load
    ds = self._load_dataset_with_area(dsid, coords, **kwargs)
satpy/readers/yaml_reader.py:846: in _load_dataset_with_area
    area = self._load_dataset_area(dsid, file_handlers, coords, **kwargs)
satpy/readers/yaml_reader.py:820: in _load_dataset_area
    return self._load_area_def(dsid, file_handlers, **kwargs)
satpy/readers/yaml_reader.py:735: in _load_area_def
    return _load_area_def(dsid, file_handlers)
satpy/readers/yaml_reader.py:955: in _load_area_def
    area_defs = [fh.get_area_def(dsid) for fh in file_handlers]
satpy/readers/yaml_reader.py:955: in <listcomp>
    area_defs = [fh.get_area_def(dsid) for fh in file_handlers]
satpy/readers/mimic_TPW2_nc.py:124: in get_area_def
    latlon = np.meshgrid(self['lonArr'], flip_lat)
<__array_function__ internals>:5: in meshgrid
    ???
/usr/lib/python3/dist-packages/numpy/lib/function_base.py:4227: in meshgrid
    output = [x.copy() for x in output]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

.0 = <list_iterator object at 0xa7832418>

>   output = [x.copy() for x in output]
E   numpy.core._exceptions._ArrayMemoryError: Unable to allocate 618. MiB for an array with shape (9001, 18000) and data type float32

/usr/lib/python3/dist-packages/numpy/lib/function_base.py:4227: MemoryError
============================================================= warnings summary =============================================================

[CUT]

========================================================= short test summary info ==========================================================
FAILED satpy/tests/reader_tests/test_mimic_TPW2_nc.py::TestMimicTPW2Reader::test_load_mimic - numpy.core._exceptions._ArrayMemoryError: U...
===================== 1 failed, 1334 passed, 10 skipped, 12 deselected, 4 xfailed, 521 warnings in 4057.27s (1:07:37) ======================

ASAP I will try to refine a little bit more tests that fail in test_modis_l1b (unfortunately tests in docker * qemu are quite slow). Than maybe I will try to submit a patch with marks form "heavy" tests.

paulgevers commented 2 years ago

Maybe (or maybe not) it's interesting to note that the Debian host that runs the armhf tests is actually a 255 GB system with 140 cores. So, memory itself shouldn't be a problem, well, except of course that a 32 bit OS isn't able to access all of it.

avalentino commented 2 years ago

Good point @paulgevers. Originally we had failures on i386 as well, probably related to limits of the 32bit architecture. I was able to identify problematic tests and disabled them in the Debian package. The workaround worked actually both on i386 and on my emulated armhf environment. @sebastic was also able to successfully run the test suite on one of the Debian porter boxes. Unfortunately we still have a failure on dabian-ci for armhf.

Today I have uploaded a new version of the Debian package for satpy in which I have disabled all tests in test_modis_l1b hoping that it helps. Now I'm waiting for the next debian-ci run.

avalentino commented 2 years ago

@djhoese after a deep analysis on a armhf machine it seems that offending tests are:

satpy.tests.reader_tests.test_mimic_TPW2_nc.TestMimicTPW2Reader.test_load_mimic
satpy.tests.reader_tests.test_modis_l1b.TestModisL1b.test_load_longitude_latitude

I'm pretty sure that the issue is not related to lack of resources.

djhoese commented 2 years ago

What do you mean that they are the offending tests? That if they are included in the overall series of tests to run that they cause other tests to fail? That they are the only tests that fail? If they fail, is it being unable to allocate memory or something else?

avalentino commented 2 years ago

Yes if they are included in the test suite that also other test fail (the list is reported above). When the satpy.tests.reader_tests.test_mimic_TPW2_nc.TestMimicTPW2Reader.test_load_mimic and satpy.tests.reader_tests.test_modis_l1b.TestModisL1b.test_load_longitude_latitude are excluded form the suite, then remaining tests run successfully.

And yes, the problem is a MemoryError, e.g.:

$ python3 -m pytest -k test_load_longitude_latitude
============================= test session starts ==============================
platform linux -- Python 3.9.9, pytest-6.2.5, py-1.10.0, pluggy-0.13.0
rootdir: /home/avalentino/satpy
plugins: lazy-fixture-0.6.3
collected 1415 items / 1409 deselected / 2 skipped / 4 selected                

satpy/tests/reader_tests/test_modis_l1b.py ....F                                                                              [ 83%]
satpy/tests/reader_tests/test_modis_l2.py E                                                                                   [100%]

============================================================== ERRORS ===============================================================
____________ ERROR at setup of TestModisL2.test_load_longitude_latitude[modis_l2_nasa_mod35_file-True-False-False-1000] _____________

request = <FixtureRequest for <Function test_load_longitude_latitude[modis_l2_nasa_mod35_file-True-False-False-1000]>>

    def fill(request):
        item = request._pyfuncitem
        fixturenames = getattr(item, "fixturenames", None)
        if fixturenames is None:
            fixturenames = request.fixturenames

        if hasattr(item, 'callspec'):
            for param, val in sorted_by_dependency(item.callspec.params, fixturenames):
                if val is not None and is_lazy_fixture(val):
>                   item.callspec.params[param] = request.getfixturevalue(val.name)

/usr/lib/python3/dist-packages/pytest_lazyfixture.py:35: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
satpy/tests/reader_tests/_modis_fixtures.py:512: in modis_l2_nasa_mod35_file
    variable_infos.update(_get_cloud_mask_variable_info("Cloud_Mask", 1000))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

var_name = 'Cloud_Mask', resolution = 1000

    def _get_cloud_mask_variable_info(var_name: str, resolution: int) -> dict:
        num_bytes = 6
        shape = _shape_for_resolution(resolution)
>       data = np.zeros((num_bytes, shape[0], shape[1]), dtype=np.int8)
E       numpy.core._exceptions._ArrayMemoryError: Unable to allocate 15.7 MiB for an array with shape (6, 2030, 1354) and data type int8

satpy/tests/reader_tests/_modis_fixtures.py:415: MemoryError
============================================================= FAILURES ==============================================================
___________________ TestModisL1b.test_load_longitude_latitude[modis_l1b_nasa_1km_mod03_files-True-True-True-250] ____________________

self = <satpy.tests.reader_tests.test_modis_l1b.TestModisL1b object at 0x7af874d8>
input_files = ['/tmp/pytest-of-avalentino/pytest-12/modis_l1b0/MOD021km_A22004_211807_2022004211807.hdf', '/tmp/pytest-of-avalentino/pytest-12/modis_l1b4/MOD03_A22004_212939_2022004212939.hdf']
has_5km = True, has_500 = True, has_250 = True, default_res = 250

    @pytest.mark.parametrize(
        ('input_files', 'has_5km', 'has_500', 'has_250', 'default_res'),
        [
            [lazy_fixture('modis_l1b_nasa_mod021km_file'),
             True, False, False, 1000],
            [lazy_fixture('modis_l1b_imapp_1000m_file'),
             True, False, False, 1000],
            [lazy_fixture('modis_l1b_nasa_mod02hkm_file'),
             False, True, True, 250],
            [lazy_fixture('modis_l1b_nasa_mod02qkm_file'),
             False, True, True, 250],
            [lazy_fixture('modis_l1b_nasa_1km_mod03_files'),
             True, True, True, 250],
        ]
    )
    def test_load_longitude_latitude(self, input_files, has_5km, has_500, has_250, default_res):
        """Test that longitude and latitude datasets are loaded correctly."""
        scene = Scene(reader='modis_l1b', filenames=input_files)
        shape_5km = _shape_for_resolution(5000)
        shape_500m = _shape_for_resolution(500)
        shape_250m = _shape_for_resolution(250)
        default_shape = _shape_for_resolution(default_res)
        with dask.config.set(scheduler=CustomScheduler(max_computes=1 + has_5km + has_500 + has_250)):
>           _load_and_check_geolocation(scene, "*", default_res, default_shape, True)

satpy/tests/reader_tests/test_modis_l1b.py:141: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
satpy/tests/reader_tests/test_modis_l1b.py:57: in _load_and_check_geolocation
    lon_vals, lat_vals = dask.compute(lon_arr, lat_arr)
/usr/lib/python3/dist-packages/dask/base.py:570: in compute
    results = schedule(dsk, keys, **kwargs)
satpy/tests/utils.py:284: in __call__
    return dask.get(dsk, keys, **kwargs)
/usr/lib/python3/dist-packages/dask/local.py:563: in get_sync
    return get_async(
/usr/lib/python3/dist-packages/dask/local.py:506: in get_async
    for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.9/concurrent/futures/_base.py:438: in result
    return self.__get_result()
/usr/lib/python3.9/concurrent/futures/_base.py:390: in __get_result
    raise self._exception
/usr/lib/python3/dist-packages/dask/local.py:548: in submit
    fut.set_result(fn(*args, **kwargs))
/usr/lib/python3/dist-packages/dask/local.py:237: in batch_execute_tasks
    return [execute_task(*a) for a in it]
/usr/lib/python3/dist-packages/dask/local.py:237: in <listcomp>
    return [execute_task(*a) for a in it]
/usr/lib/python3/dist-packages/dask/local.py:228: in execute_task
    result = pack_exception(e, dumps)
/usr/lib/python3/dist-packages/dask/local.py:223: in execute_task
    result = _execute_task(task, data)
/usr/lib/python3/dist-packages/dask/core.py:121: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
/usr/lib/python3/dist-packages/dask/optimization.py:969: in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/lib/python3/dist-packages/dask/core.py:151: in get
    result = _execute_task(task, cache)
/usr/lib/python3/dist-packages/dask/core.py:121: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
/usr/lib/python3/dist-packages/dask/utils.py:35: in apply
    return func(*args, **kwargs)
<__array_function__ internals>:5: in repeat
    ???
/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py:479: in repeat
    return _wrapfunc(a, 'repeat', repeats, axis=axis)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

obj = array([[4286374.5, 4286871.5, 4287369.5, ..., 4504420.5, 4504896. ,
        4505371. ],
       [4286374.5, 4286871.5, ...    4671754. ],
       [4450404. , 4450908.5, 4451413. , ..., 4670795.5, 4671274.5,
        4671754. ]], dtype=float32)
method = 'repeat', args = (4,), kwds = {'axis': 1}, bound = <built-in method repeat of numpy.ndarray object at 0x7ac058c0>

    def _wrapfunc(obj, method, *args, **kwds):
        bound = getattr(obj, method, None)
        if bound is None:
            return _wrapit(obj, method, *args, **kwds)

        try:
>           return bound(*args, **kwds)
E           numpy.core._exceptions._ArrayMemoryError: Unable to allocate 11.0 MiB for an array with shape (1600, 1804) and data type float32

/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py:57: MemoryError
--------------------------------------------------------- Captured log call ---------------------------------------------------------
WARNING  satpy.readers.hdfeos_base:hdfeos_base.py:145 Malformed EOS metadata, missing an END.
WARNING  satpy.readers.hdfeos_base:hdfeos_base.py:145 Malformed EOS metadata, missing an END.
WARNING  satpy.readers.hdfeos_base:hdfeos_base.py:145 Malformed EOS metadata, missing an END.
WARNING  satpy.readers.hdfeos_base:hdfeos_base.py:145 Malformed EOS metadata, missing an END.
WARNING  satpy.readers.hdfeos_base:hdfeos_base.py:145 Malformed EOS metadata, missing an END.
WARNING  satpy.readers.hdfeos_base:hdfeos_base.py:145 Malformed EOS metadata, missing an END.
========================================================= warnings summary ==========================================================
satpy/readers/seviri_base.py:453
  /home/avalentino/satpy/satpy/readers/seviri_base.py:453: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    ('GsicsCalMode', np.bool),

satpy/readers/seviri_base.py:454
  /home/avalentino/satpy/satpy/readers/seviri_base.py:454: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    ('GsicsCalValidity', np.bool),

satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py:541
  /home/avalentino/satpy/satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py:541: PytestUnknownMarkWarning: Unknown pytest.mark.file_handler_data - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
    @pytest.mark.file_handler_data(mask_bad_quality=False)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
====================================================== short test summary info ======================================================
FAILED satpy/tests/reader_tests/test_modis_l1b.py::TestModisL1b::test_load_longitude_latitude[modis_l1b_nasa_1km_mod03_files-True-True-True-250]
ERROR satpy/tests/reader_tests/test_modis_l2.py::TestModisL2::test_load_longitude_latitude[modis_l2_nasa_mod35_file-True-False-False-1000]
===================== 1 failed, 4 passed, 2 skipped, 1409 deselected, 3 warnings, 1 error in 787.89s (0:13:07) ======================

djhoese commented 2 years ago

You said you ran these on an "armhf" system, what does that mean? What were the resources? I'll try to find time to profile the tests and see if I can see some major memory usage from these tests. The mimic tests is a little surprising that it effects anything, but I'll see what I can find. The MODIS one I expect to use some memory.

avalentino commented 2 years ago

I run my tests on abel.debian.org at the following link you can find the description of the HW: https://db.debian.org/machines.cgi?host=abel. Please note that, at least the problem with test_load_longitude_latitude, also happens on i386 which is a lot easier to access or emulate.

djhoese commented 2 years ago

So I ran the tests locally on my PopOS/Ubuntu system with 12 cores and 64GB RAM and used a Python 3.9 environment with the satpy main branch. I ran pytest satpy/tests/reader_tests/ and the memory hovers around 12GB total usage (while doing other things) and definitely peaks for the MODIS L1b tests at around 16GB. However, the memory then immediately drops back down to 12GB.

@avalentino would it be possible for you to monitor memory usage on your ARM system while running the above pytest command and see what happens. If the memory keeps growing then we have a bug somewhere in pytest or python for ARM. If it just peaks during the MODIS tests then I'm not sure why it would make later tests fail. You aren't running the tests in parallel are you?

djhoese commented 2 years ago

Oh I just saw your latest comment with the hardware info. 4GB is not very much memory for running these tests. We may need to put in some real work to try to reduce the memory usage of the tests.

paulgevers commented 2 years ago

4GB is about the maximum amount of memory that can be used/reached on a 32 bit OS, no?

As said, the host (arm64) that runs the armhf tests (in lxc) on ci.debian.net has 255 GB, but that doesn't mean that programs (at least single threads) on that OS can reach it all at once IIUC.

djhoese commented 2 years ago

4GB is about the maximum amount of memory that can be used/reached on a 32 bit OS, no?

Good point. So if a 250m resolution MODIS swath is loaded that would be ~4140 columns. If my math is correct then 4GB of memory can only hold about 247 rows of data that size. Seems my test is producing something along the lines of 8120 rows. I'll see if I can reduce that, but I think it would mean modifying all the MODIS tests.

Edit: I don't think I did my math right :wink:

Edit 2: No maybe I did.

avalentino commented 2 years ago

@djhoese probably it is enough to automatically skip tests requiring more than 4GB on 32bit platforms.

pytroll / satpy

Test failure on i386 and armhf #1883