pvlib / pvlib-python

A set of documented functions for simulating the performance of photovoltaic energy systems.
https://pvlib-python.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.21k stars 1.01k forks source link

the test suite is too slow #878

Closed wholmgren closed 4 years ago

wholmgren commented 4 years ago

The test suite is getting to be quite slow. Here are the 20 slowest tests on my mac:

pytest --durations=20

18.96s call     pvlib/tests/iotools/test_pvgis.py::test_get_pvgis_tmy_kwargs
16.32s call     pvlib/tests/iotools/test_psm3.py::test_get_psm3_singleyear
14.82s setup    pvlib/tests/test_forecast.py::test_process_data[HRRR]
8.12s call     pvlib/tests/iotools/test_psm3.py::test_get_psm3_tmy
6.82s setup    pvlib/tests/test_forecast.py::test_process_data[NDFD]
5.47s call     pvlib/tests/iotools/test_pvgis.py::test_get_pvgis_tmy_epw
4.78s call     pvlib/tests/iotools/test_pvgis.py::test_get_pvgis_tmy
3.99s setup    pvlib/tests/test_forecast.py::test_process_data[RAP]
3.45s call     pvlib/tests/iotools/test_srml.py::test_read_srml_month_from_solardat
3.20s call     pvlib/tests/test_forecast.py::test_bad_kwarg_get_data
3.16s call     pvlib/tests/test_forecast.py::test_bad_kwarg_get_processed_data
3.13s call     pvlib/tests/test_forecast.py::test_how_kwarg_get_processed_data
3.13s setup    pvlib/tests/test_forecast.py::test_process_data[NAM]
3.12s call     pvlib/tests/test_forecast.py::test_vert_level
3.08s call     pvlib/tests/test_forecast.py::test_datetime
2.98s call     pvlib/tests/test_irradiance.py::test_get_extra_radiation_nrel_numba
2.85s call     pvlib/tests/test_solarposition.py::test_spa_python_numba_physical
2.82s call     pvlib/tests/test_solarposition.py::test_get_solarposition_deltat[None-nrel_numba]
2.73s setup    pvlib/tests/test_spa.py::NumbaSpaTest::test_aberration_correction
2.69s call     pvlib/tests/iotools/test_pvgis.py::test_get_pvgis_tmy_basic

I will work on the forecast tests. Any ideas for the iotools tests?

CameronTStark commented 4 years ago

It's the API calls that make the iotools take long. I assume you're making that consideration for its impact on Azure Pipelines right? Locally, you can split tests into different processes to speed it up locally but that's not usable on a free version of Azure Pipelines.

I explored parallelism for Azure Pipelines briefly when I was coming on board to helping admin pvlib. Azure Pipelines requires a paid account to enable more processes and even then I think it splits different test jobs rather than providing another core for the primary process. Maybe there's an async solution out there that could work to run something local while waiting for the API call but it'll have to be a turnkey solution to be worth the effort if it exists. Probably the only thing we could do at this point is minimize the amount of API calls unless.

kandersolar commented 4 years ago

One idea is to mock the network calls by default [1] but leave the option to perform them when specifically requested [2].

[1] https://requests-mock.readthedocs.io/en/latest/pytest.html [2] https://docs.pytest.org/en/2.9.1/example/simple.html#control-skipping-of-tests-according-to-command-line-option

(edit -- the [2] link above is super out of date, not sure why it showed up for me on google. a little irrelevant since pytest-remotedata does the same thing anyway)

mikofski commented 4 years ago

I have two suggestions, in addition to Kevin's idea.

  1. Don't run the tests that have API calls for PR commits, only on commits to master, or even just for releases. These tests are exactly the same every time, we're not changing inputs, so like Cameron and Kevin say, they're mainly testing the API, has it changed? So I think we only need to check this once in a while.

  2. Split off iotools into a separate repo and package, call it pvlib.iotools, and make a dependency of pvlib so that it installs automatically as if it were already part of pvlib, similar to Sphinx extensions.

wholmgren commented 4 years ago

@CameronTStark I'm concerned about both CI speed and local performance. A lot of people seem to rely on the CI, so faster feedback is better for them. I typically run tests locally before pushing to github, starting with the relevant module and then the whole suite. I usually forget to install and use pytest-xdist.

I'm in favor of running only the live network tests on one CI instance. I think that's a common pattern for other pydata projects. I'm also in favor of mocking calls where appropriate.

My original hope was that we could simply request less data in some of these calls.

I'm -1 on splitting iotools into a separate package at this point, but open to it in the future. I think it's too much overhead for now.

mikofski commented 4 years ago

I'm -1 on splitting iotools into a separate package at this point, but open to it in the future. I think it's too much overhead for now.

I was a little scared of that too, but just mentioned it as a remote possibility, nuclear option, thx

I'm in favor of running only the live network tests on one CI instance. I think that's a common pattern for other pydata projects. I'm also in favor of mocking calls where appropriate.

I think a wiki or doc page in contribution guidelines (maybe I missed this) to clarify expected testing patterns and recommended PR test procedures could be helpful to align everyone

CameronTStark commented 4 years ago

@wholmgren now that pytest-remote data has been implemented are you satisfied with the speed or do you see more room for improvement?

wholmgren commented 4 years ago

Yes, I'm happy with the speed when the remote data tests are excluded. I don't think it's worth the effort to make the remote tests faster at this point in time. Thanks everyone here for your help and ideas.