Enable Continuous Integration Testing

rburghol commented 4 months ago

The goal is to develop a set of scripts to verify performance and execution of all facets of the hsp2 install upon making changes, so that we can easily see if anything obvious has broken as we improve things. Thus far, testing is being setup on the forked project at https://github.com/HARPgroup/HSPsquared/, and once we arrive at a working setup we will share it back here.

Tasks

[x] Test setup CI on forked repo with install of target environment and packages
[x] Create simple pytest script for proof of concept
[x] Create hsp2-install-test.yml file to house base testing (over environment.yml config)
- [x] Develop pip installer test action https://github.com/HARPgroup/HSPsquared/blob/testx/.github/workflows/hsp2-pip-install-test.yml
[x] Create hsp2-install-test-dev.yml file to house base testing (over environment_dev.yml config)
- [x] Develop pip installer test action https://github.com/HARPgroup/HSPsquared/blob/testx/.github/workflows/hsp2-pip-install-test-dev.yml
- [x] Develop conda installer test action (see Conda below)
[x] Enable running test10 in hsp2-install-test.yml
- [x] Create the h5 and import the UCI
- [x] Create pytest_test10.py to do summaries assert that h5 file is created
[x] Lay out basic parameters for cost-efficient testing (per https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions and https://docs.github.com/en/billing/managing-billing-for-github-packages/about-billing-for-github-packages)
[x] Create pull request to develop #142

Conda

github VMs supposedly have an environment variable CONDA set up, so:
- install with $CONDA/bin/conda env update --file environment.yml --name base
- see: https://autobencoder.com/2020-08-24-conda-actions/

Notes

Based loosely on https://coderefinery.github.io/testing/continuous-integration/, and then during testing I have updated to reflect some suggestions/variations from https://docs.github.com/en/enterprise-server@3.11/actions/automating-builds-and-tests/building-and-testing-python
- Also:
  - On windows to test pytest with pip, run pip install --trusted-host pypi.org --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --default-timeout=1000 pytest pytest-cov
Base enabling complete, now working through error when pytest runs the file tests/convert/conversion_test.py.
- pytest has automatic discovery (if run without explicit arguments) https://docs.pytest.org/en/7.1.x/explanation/goodpractices.html#test-discovery

Start at Actions page:

Disabling a Test

Enabling a Test

Cost Factors for Testing

Basic accounts get 2,000 testing minutes per month free from github (paid accounts get 3,000)
Current pip/Linux test path takes ~4.1 mins to execute
- Uses test10
- Install/dependencies 22 secs
- Install hsp2 37 secs
- Import+Run hsp2 ~3.0 mins
Current conda/Linux test path takes N/A to execute
- Uses test10
- Install/dependencies 1.3 mins
- Install hsp2 1 secs (this seems rather short, unless the conda install took care of all dependencies)
- Import+Run hsp2 failed
Loading estimates:
- Note: test triggering is by branch, and is off by default, i.e., you have to specify a branch as a target of a test.
- Tests can be triggered on PRs or pushs (commits that get pushed to github from someones local machine)
- Last month there were 7 pull requests on develop, and 27 commits on develop-specact
- On our fork repo we 95 commits, more than 70 of which were tiny incremental commits done to test the testing framework, so, a reasonable max monthly estimate might be 50 commits per month.
- If testing were enabled for:
- all PRs to develop and master,
- all commits to the develop-specact:
- all testing was performed on respec (which is unlikely),
- 740 minutes / month:
  - ~ 90 test events per month
  - 4 mins per test * 2 tests (normal and dev install)
  - ~740 minutes / month

@PaulDudaRESPEC

PaulDudaRESPEC commented 4 months ago

Setting up Continuous Integration would be a really nice step forward for the project. My first suggestion would be to work with test10 (tests/test10), as it includes a full suite of hydrology, sediment, and water quality.

rburghol commented 4 months ago

I am running into a snag on the CI setup with it failing at the "lint" step, on the file setup.py. The error is:

./setup.py:87:13: F821 undefined name '__version__'

I can see that the file _version.py establishes the __version__ variable, and that file appears to be loaded in setup.py line 8 (with exec(open("./_version.py").read())), so I'm unsure as to why it appears to causing a problem in the lint testing. I know that we have made some small tweaks to setup.py over the last 12 months, anyhow, if anyone has any ideas I am all for it. Hey @timcera I think you set this up originally, do you have any thoughts you could throw out there?

Note: we are testing this CI in github over on a forked repo at https://github.com/HARPgroup/HSPsquared

TimothyCera-NOAA commented 4 months ago

flake8 doesn't run the code, so the "exec(open("./_version.py").read())" isn't actually executed to set the version variable. You have to setup flake8 to ignore it.

rburghol commented 4 months ago

Oh sweet - worked like a charm - thanks @TimothyCera-NOAA !!

rburghol commented 4 months ago

numba 0.53.1 is unavailable on the gihub Ubuntu latest test env

timcera commented 4 months ago

It isn't Ubuntu. numba 0.53.1 isn't available for Python 3.10. https://pypi.org/project/numba/0.53.1/#files

timcera commented 4 months ago

You need 0.55.0 or later for Python 3.10 support https://pypi.org/project/numba/0.55.0/#files

timcera commented 4 months ago

Numba's latest version (0.59.*) dropped support for Python 3.8, which is HSPSquared's target Python. It seems Numba supports three Python versions back from the latest Python version.

Numba 0.56.0 Python 3.7 to 3.10 Numba 0.57.0 Python 3.8 to 3.11 Numba 0.58.0 Python 3.8 to 3.11 Numba 0.59.0 Python 3.9 to 3.12

rburghol commented 4 months ago

Thanks a ton @timcera, that makes sense. I noticed that the build system did not autodetect the correct python version (choosing 3.10), so I will dive in there.

I know there is some support around here for moving forward to newer versions of python+numba, and then the complication when increasing numba version is that tables 3.6.1 breaks numba -- at least with the default 0.59.

I think what I will do is to create separate test setups to evaluate the current working setup (py 3.8), and the do a separate one for forward looking installs.

aufdenkampe commented 4 months ago

@rburghol, @PaulDudaRESPEC, and @timcera.

Thanks for moving all this forward with recent releases and the move to CI/CD.

I'm a big fan of keeping the dev/target environment up to date. Moving to Python 3.10 or even 3.11 is a really good idea. One approach to figuring out what versions are broadly supported is to follow the latest Anaconda meta-package releases: https://docs.anaconda.com/free/anaconda/pkg-docs/

Anaconda does substantial testing for cross compatibility of most scientific Python dependencies, so once Anaconda supports a new version of Python, it's generally safe to adopt. So, Anaconda's default configuration has used Python 3.10 since March 2023 and started supporting Python 3.11 around that time.

When we were actively developing v0.9 and v0.10 in Fall of 2021, there were a number of very quirky issues and incompatibilities with different versions of numpy, numba, and pyTables, so we had to pin to very specific versions. Those issues have all cleared out.

Note that for the v0.10 release, we were starting to use environment_dev.yml, with Python 3.9 and pyTables 3.7 and other updates, and it was working quite well.

The approach I would take to updating would be to modify environment_dev.yml by updating to python 3.10 and unpinning all versions of numpy, numba, pyTables, and HDF5. It's probably also worth changing to pandas <2 to get the latest version of the pandas 1.x series. Then let conda/mamba solve for the environment. If you use conda, I recommend using --solver=libmamba flag to speed up the process.

aufdenkampe commented 4 months ago

I just added and tested some potential updates to the conda environment files in the new dependency_options branch with commits 00748d8 and 4dfc4f3. Both environments solved quickly with libmamba.

See:

rburghol commented 4 months ago

Thanks a bunch @aufdenkampe -- FWIW, I managed to get the basic testing to pass by going to python 3.8 in the actions file, but I am with you 100% on the desire to 1) keep as current as possible, and 2) move away from fixed library versions when possible.

I will start a next step with testing the environment.yml that you specified to see how it plays on the github ubuntu-latest architecture.

aufdenkampe commented 4 months ago

@rburghol, that's great to hear.

Both of my environments installed pytables 3.9.2, hdf5 1.14.3, numba 0.59.0, numpy 1.26.4. Note that upgrade from pandas 1.x to 2.x broke some backwards compatibility, so I created the default env for pandas 1.x and the dev env for pandas 2.x. This will allow you to test those separately. The upgrade to pandas 2 is worth it for performance benefits, but needs to be taken cautiously.

rburghol commented 4 months ago

Thanks for the pointers @aufdenkampe / @timcera .

~quick (maybe super basic) question: how do you specify which environment file to use when compiling with pip -- or with conda for that matter?.~

I am currently testing sudo pip install -e .[dev] in the linux pip install workflow - though I may need some advice (my python experience is quite limited)

rburghol commented 4 months ago

Hey @aufdenkampe I set up a basic testing script for conda installs. The install appears to go without incident, but the hsp2 executable cannot be found. I say the conda hsp2 install install appears to go, because is not nearly as verbose as the one that is produced via pip, and as I have no conda experience, I cannot tell if it is correct. If y0u have any ideas on how to fix it I'd be glad to incorporate. Image capture below, and relevant code:

install code: $CONDA/bin/conda-develop ./
conda action script here: https://github.com/HARPgroup/HSPsquared/actions/runs/8101281765/workflow

aufdenkampe commented 4 months ago

@rburghol, it looks to me that within your "Install hsp2" step the conda develop command ran correctly to install the hsp2 library into the Python path for the conda environment. This will give a Python kernel access to all the hsp2 functions that are demonstrated in the example Jupyter notebooks.

Running conda develop is a great approach when one is actively developing functions in a library, but it won't install the command-line utility.

That's why your "Run hsp2 test10" step fails, as it uses the command-line utility, which requires pip install. Fortunately, we install pip in our conda environment, so we can access it for this purpose.

So I suggest modifying your GitHub actions script to this:

    - name: Install hsp2
      run: |
        # install the hsp2 executable into the conda environment, using pip
        $CONDA/bin/pip install .

Unfortunately, when I did that manually, it all seemed to work well but then gave me to an error when I ran hsp2 import_uci from the terminal, which I'm not sure how to solve because of my limited experience with the Command Line Interface (CLI) that @timcera built using mando:

(hsp2_py310) aaufdenkampe@ExcaliburMac2 HSP2results % hsp2 import_uci      
Traceback (most recent call last):
  File "/Users/aaufdenkampe/miniconda3/envs/hsp2_py310/bin/hsp2", line 5, in <module>
    from HSP2tools.HSP2_CLI import main
  File "/Users/aaufdenkampe/miniconda3/envs/hsp2_py310/lib/python3.10/site-packages/HSP2tools/HSP2_CLI.py", line 11, in <module>
    def run(hdfname, saveall=True, jupyterlab=False):
  File "/Users/aaufdenkampe/miniconda3/envs/hsp2_py310/lib/python3.10/site-packages/mando/core.py", line 68, in _command
    return self._generate_command(func, *args, **kwargs)
  File "/Users/aaufdenkampe/miniconda3/envs/hsp2_py310/lib/python3.10/site-packages/mando/core.py", line 103, in _generate_command
    doc = str(NumpyDocstring(doc, config))
  File "/Users/aaufdenkampe/miniconda3/envs/hsp2_py310/lib/python3.10/site-packages/mando/napoleon/docstring.py", line 878, in __init__
    super(NumpyDocstring, self).__init__(docstring, config, app, what,
  File "/Users/aaufdenkampe/miniconda3/envs/hsp2_py310/lib/python3.10/site-packages/mando/napoleon/docstring.py", line 115, in __init__
    elif isinstance(obj, collections.Callable):  # type: ignore
AttributeError: module 'collections' has no attribute 'Callable'

@timcera, any ideas?

rburghol commented 3 months ago

Ahh ok @aufdenkampe thanks -- note that I simply used conda develop because the main hsp2 page recommends that as the way to install on Windows.

Anyhow, one solution could be to test the conda install step on a windows VM. I am going to defer doing that as it's largely outside my wheelhouse (and we use pip installs, not conda on our windows machines), but if anyone wishes to take up the reins on the conda/windows install path, I am more than happy to provide any workflow setup advice or feedback.

I believe that this issue is now complete, and I have submitted a PR against develop for consideration @PaulDudaRESPEC. https://github.com/respec/HSPsquared/pull/142 I created a separate issue for the creation of advanced accuracy and performance testing of actual hsp2 model runs

rburghol commented 1 month ago

This is now well underway -- thanks @austinorr -- closing, with further development tracked in #143

respec / HSPsquared