pvlib / pvlib-python

A set of documented functions for simulating the performance of photovoltaic energy systems.
https://pvlib-python.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.19k stars 997 forks source link

Local test failure in `test_ecmwf_macc.py::test_read_ecmwf_macc` #1609

Closed kandersolar closed 1 year ago

kandersolar commented 1 year ago

Describe the bug @mdeceglie and I observe pvlib/tests/iotools/test_ecmwf_macc.py::test_read_ecmwf_macc to fail locally with ValueError: operands could not be broadcast together with shapes (4,) (2,).

To Reproduce Steps to reproduce the behavior:

  1. conda create -n ecmwf39 python=3.9
  2. conda activate ecmwf39
  3. pip install pvlib[all]
  4. pytest pvlib\tests\iotools\test_ecmwf_macc.py
pytest output ```python pytest pvlib\tests\iotools\test_ecmwf_macc.py ================================================= test session starts ================================================= platform win32 -- Python 3.9.15, pytest-7.2.0, pluggy-1.0.0 rootdir: C:\Users\KANDERSO\projects\pvlib-python plugins: cov-4.0.0, mock-3.10.0, remotedata-0.3.3, rerunfailures-10.3, timeout-2.1.0, requests-mock-1.10.0 collected 3 items pvlib\tests\iotools\test_ecmwf_macc.py ..F [100%] ====================================================== FAILURES ======================================================= ________________________________________________ test_read_ecmwf_macc _________________________________________________ expected_test_data = WindowsPath('C:/Users/KANDERSO/projects/pvlib-python/pvlib/data/aod550_tcwv_20121101_test.nc') @requires_netCDF4 def test_read_ecmwf_macc(expected_test_data): """Test reading ECMWF_MACC data from netCDF4 file.""" data = ecmwf_macc.read_ecmwf_macc( expected_test_data, 38, -122) expected_times = [ 1351738800, 1351749600, 1351760400, 1351771200, 1351782000, 1351792800, 1351803600, 1351814400] assert np.allclose(data.index.view(np.int64) // 1000000000, expected_times) expected_aod = np.array([ 0.39531226, 0.22371339, 0.18373083, 0.15010143, 0.130809, 0.11172834, 0.09741255, 0.0921606]) expected_tcwv = np.array([ 26.56172238, 22.75563109, 19.37884304, 16.19186269, 13.31990346, 11.65635338, 10.94879802, 10.55725756]) assert np.allclose(data.aod550.values, expected_aod) assert np.allclose(data.tcwv.values, expected_tcwv) assert np.allclose(data.precipitable_water.values, expected_tcwv / 10.0) datetimes = (datetime.datetime(2012, 11, 1, 9, 0, 0), datetime.datetime(2012, 11, 1, 12, 0, 0)) data_9am_12pm = ecmwf_macc.read_ecmwf_macc( expected_test_data, 38, -122, datetimes) > assert np.allclose(data_9am_12pm.aod550.values, expected_aod[2:4]) pvlib\tests\iotools\test_ecmwf_macc.py:74: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ <__array_function__ internals>:180: in allclose ??? ..\..\Software\Anaconda3\envs\ecmwf39\lib\site-packages\numpy\core\numeric.py:2265: in allclose res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan)) <__array_function__ internals>:180: in isclose ??? ..\..\Software\Anaconda3\envs\ecmwf39\lib\site-packages\numpy\core\numeric.py:2375: in isclose return within_tol(x, y, atol, rtol) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x = array([0.22371339, 0.18373083, 0.15010143, 0.130809 ]), y = array([0.18373083, 0.15010143]), atol = 1e-08 rtol = 1e-05 def within_tol(x, y, atol, rtol): with errstate(invalid='ignore'): > return less_equal(abs(x-y), atol + rtol * abs(y)) E ValueError: operands could not be broadcast together with shapes (4,) (2,) ..\..\Software\Anaconda3\envs\ecmwf39\lib\site-packages\numpy\core\numeric.py:2356: ValueError =============================================== short test summary info =============================================== FAILED pvlib/tests/iotools/test_ecmwf_macc.py::test_read_ecmwf_macc - ValueError: operands could not be broadcast together with shapes (4,) (2,) ============================================= 1 failed, 2 passed in 0.73s ============================================= ```

Expected behavior I expect tests to pass. I also expect local test results to be consistent with CI test results.

Versions:

Additional context The test failure is with netCDF4==1.6.2 from PyPI. Reinstalling it in the environment created above but from conda-forge (pip uninstall netCDF4; conda install -c conda-forge netCDF4) results in the test being skipped. At least in my Windows environment, this is because of some incompatibility between h5py and netCDF4 causing import h5py, netCDF4 to raise ImportError but import netCDF4 to be fine.

AdamRJensen commented 1 year ago

It seems the error happens in the selection of the data in the variable data_9am_12pm: https://github.com/pvlib/pvlib-python/blob/5b059143cd16f1d3670a7b4dcc931f1f356a497e/pvlib/tests/iotools/test_ecmwf_macc.py#L70-L73

Which has the following output:

                          tcwv    aod550  precipitable_water
2012-11-01 06:00:00  22.755631  0.223713            2.275563
2012-11-01 09:00:00  19.378843  0.183731            1.937884
2012-11-01 12:00:00  16.191863  0.150101            1.619186
2012-11-01 15:00:00  13.319903  0.130809            1.33199

Only the two middle values (9 am and 12 pm) should have been selected. However, somehow the values before and after are also included, and thus the comparison error happens (as 4 values are compared to 2 values).

kandersolar commented 1 year ago

I've tracked it down to this change in cftime 1.6.0: https://github.com/Unidata/cftime/pull/273

The difference, as @AdamRJensen notes, is that the behavior of select='before' and select='after' have changed; in cftime<1.6.0 they return the specified times if they exist, and the closest preceding/following times otherwise, while in cftime>=1.6.0 they always return the closest preceding/following times. Here are the relevant lines: https://github.com/pvlib/pvlib-python/blob/55e05776877416fb2bcc43eeee4119b8b0504942/pvlib/iotools/ecmwf_macc.py#L296-L299

I'm not sure what is best for us to do. Switching to select='nearest' might make sense. Feedback from someone that actually uses iotools.read_ecmwf_macc would be good!

AdamRJensen commented 1 year ago

I doubt anyone really uses this function... it doesn't really seem so from the search results. I could be convinced to remove it.

@mikofski are you still using this function? What was the use case? I imagine that MERRA-2 or ERA5 would be better sources of data these days.

MACC Reanalysis also seems rather outdated based on this link:

The Monitoring Atmospheric Composition and Climate (MACC) Reanalysis is a global reanalysis data set of atmospheric composition (AC), made available by the Copernicus Atmosphere Monitoring Service (CAMS). The dataset covers the period 2003 to 2012.

mikofski commented 1 year ago

I am with removing it. I used it to retrieve AOD and precipitable water from the CAMS McClear model, but I agree I think it has been archived. Those dates are correct, so it is over 10 years old by now. (Although I think our Linke turbidity data from SoDa is also that old or older.)