respec / HSPsquared

Hydrologic Simulation Program Python (HSPsquared)
GNU Affero General Public License v3.0
43 stars 17 forks source link

Develop spec lean #161

Closed rburghol closed 1 month ago

rburghol commented 1 month ago

@PaulDudaRESPEC @austinorr This PR contains the changes to the HSP2 function to include special actions, classic ACTIONS. The following is included:

This has been run installed via pip in python 3.10.8 successfully

rburghol commented 1 month ago

By the way @austinorr - I can see the use of matrix that you mentioned and that works really nicely for eliminating duplicative code in tests over mucho python versions - awesome!

austinorr commented 1 month ago

@rburghol Thanks for making this diff nice and clean. Do you want to add an HSPFresults directory with the results you got when you run it in hspf? This should probably be in a separate directory than the test10 on, maybe add a new one called test10spec? Then we can check that the code in this PR yields the same results.

rburghol commented 1 month ago

Good thinking @austinorr -- test10specl has been separated into it's own folder with HSP2results and HSPFresults folders. I will do a PR to add test coverage into the develop branch after this gets merged so that I am not updating the test files from this branch unless you think it's a must to do in one... tho we're now back in a 30+ file commit.

PaulDudaRESPEC commented 1 month ago

These changes to main.py would overwrite some of what Tim just added in PR #159 (for instance 'Union' in the arg list). Was that an accident?

austinorr commented 1 month ago

I will do a PR to add test coverage into the develop branch after this gets merged

Let’s take a look at all the new pytest code that got put together to help us automate the tests. I think once we go through it we’ll be able to remove the HSP2results directory (we only need the HSPFresults for the new test pattern) and the notebooks too, unless they’re important for documentation.

In your local Python env, try the following: $pip install -e .[dev] $pytest tests -v

you’ll find that there’s a test harness in the tests directory that we can add the test10spec to, with some minor revisions, and it’ll run it and do a full comparison.

rburghol commented 1 month ago

@PaulDudaRESPEC Definitely not code that was intentionally modified in develop–specact. Git must not be able to handle this, considering this branch has been live for so long before being merged.

austinorr commented 1 month ago

I also hope we can do the chore of fixing the “*” imports before we merge this. One of the things we’ve discussed a lot is getting this project formatted and linted and I don’t think we should burden that lift with having to fixing all of these imports. Let’s fix them here instead.

rburghol commented 1 month ago

@PaulDudaRESPEC I have migrated the changes to main.py that @timcera merged in #159

rburghol commented 1 month ago

@austinorr I am going to make these import changes.

rburghol commented 1 month ago

OK @austinorr -- the import style has been switched for all files in this PR. I am curious going forward what your preference is in regard to things like import pandas date_range, to then reference the function as date_range, versus, import pandas as np, and then using the as np.date_range?

austinorr commented 1 month ago

@rburghol Thanks, looking great! Common practice is to use the pandas or numpy namespace whole with import pandas as pd then reference things with pd.date_range, as you said. However, I don't actually have a strong opinion, as long as things are explicitly imported, then it's just personal preference and linters will pass. My preference is for less clutter up top, but that just means lots of little pd.<everythings> throughout the rest of the code.

I'm looking through the test directory you made for this, and I wonder what would be the best way to test the special actions code? I see the test10specl.uci, but it's never been run by hspf (there's no .hbn files referenced in that uci) and there's no differences between the test10.uci except the new one has a typo in line 381 that @PaulDudaRESPEC caught and fixed last week (SNDFG should be SDFG). Can we slim down the new test directory to just what we actually use to verify the new capabilities this PR added?

rburghol commented 1 month ago

@austinorr the files are different in that test10specl.uci has a special actions block beginning on line 994:

SPEC-ACTIONS
*** test special actions
  RCHRES 5                                RSED    4       +=  2.50E+05
  RCHRES 5                                RSED    5       +=  6.89E+05
  RCHRES 5                                RSED    6       +=  4.01E+05
END SPEC-ACTIONS

This was a super quick example that @PaulDudaRESPEC recommended in an email exchange we had, and thus I wonder what tests he might advise? Note: since I don't run HSPF on windows, it would be helpful if someone else ran it, I literally don't have a WinHSPF install handy.

In terms of slimming the test10specl dir, sure, it's possible, but I have no opinion other than to say I think a uniform directory tree for test files would facilitate automated testing. When you requested that I add an HSPFresults directory, I assumed I should follow the apparent project convention to have an HSPFresults dir and an HSP2results dir for the purpose of automated comparisons between results. I just cloned it, as I have not touched those other files in the past.

Again, I would ask @PaulDudaRESPEC what the thinking behind the various jupyter notebooks and the 2 directories was, and whether we should carry that practice forward.

austinorr commented 1 month ago

I can do it, and with your permission, i'll clean up the test directory.

rburghol commented 1 month ago

@austinorr I'd welcome it! (both the model running and the cleanup).
And for the folder structure, I'm happy to provide feedback or ideas if you want to get together on it, but since I was not involved in the first round of test and examples setup, I don't yet have an opinion, other than to suggest that we put *.h5 in .gitignore rather than explicitly ignoring test h5 files since these things can be a menace with githubs file size limits and that can be an impediment to less experienced git users producing tests (or to someone who is working on a tight timeline).

austinorr commented 1 month ago

@rburghol do you have any time to screen share and talk me through how to run the specl actions? I've gotten things all run through HSPF, new hbn files etc, but they aren't matching hsp2 results in reach 5 -- the one we added actions for.

rburghol commented 1 month ago

Thanks @austinorr for pushing this forward!

I'm tied up right now, but I may be able to find some time tomorrow, and definitely on Thursday I will have a great deal of flexibility.

@PaulDudaRESPEC Helped me set up this example, and may have some testing insight.

PaulDudaRESPEC commented 1 month ago

In that RCHRES SANDFG table, make sure the values are lined up in the 15th column -- they appear to be in the 16th.

rburghol commented 1 month ago

Thanks @PaulDudaRESPEC - when we set these up originally did you have a quick script to compare output between the two hsp2 runs to verify function?

PaulDudaRESPEC commented 1 month ago

I'll send you the UCI that I suggest you use for the test (via email).

rburghol commented 1 month ago

@austinorr I have incorporated the new test10specl.uci that @PaulDudaRESPEC sent into the HSP2 and HSPF results folders.

rburghol commented 1 month ago

@austinorr I just cleaned up a last few imports. I think this is now ready to go, unless you want to wait until we rethink how the object creation should pass parent and/or child in (#162), though I'd like to give that more thought, and ultimately think it's more about readability than functionality at this juncture.

austinorr commented 1 month ago

@rburghol I think we're close -- I'm going to do a couple quick performance checks with the tests now that they pass.

One thing we should do is delete all the unrelated personal development files that are in here. I had done that already in one of my commits, but they got re-added.

all compre-case.json should be deleted, right? all changes to tests/testcbp* should be reverted, right?

we don't need a wdm in test10specl/HSP2results. If a contributor makes it so they can test it with the 'convert' script, please delete it or leave it out of the commit.

we don't need the test10/HSP2results/pytest10.py file, it doesn't run any tests and it looks like it's just scratch notes for a readme. let's pull out the valuable notes and make this file a .md or .txt.

we don't need the test10/HSP2results/test10.example.py file either, this looks like scratch work and it appears nothing imports from it.

It looks like this could be avoided by using a per-file staging pattern when building a commit. VSCode and other IDEs help with this a lot, you can just click the '+' button to stage each file that has a necessary change (you can even stage selected lines!), then do git commit -m "something something new feature"

PaulDudaRESPEC commented 1 month ago

@austinorr and @rburghol , I think that test10.example.py file is not 'needed', but it is a good example of how one could add custom python code as a sophisticated special action, and therefore is worth keeping.

I think there are probably other files I've added over the years that ought to be removed. I can make a pass through before release to clean out any leftovers that shouldn't really bin there.

I continue to be holding for a nod from you both, at which point I'll merge this PR.

rburghol commented 1 month ago

@austinorr the below are all now completed and have run successfully through the tests -- a couple notes follow.

Thanks for the heads up, couple things:

austinorr commented 1 month ago

@PaulDudaRESPEC @rburghol sounds good, I'm for keeping things we need -- just want to make sure we don't add things accidentally.

If testcbp is ready to include in the test suite, let's make an HSPF directory for it so that we can get results to check against and add it to the suite. Currently only test10 and test10specl are in the pytest suite. Nothing in the pytest suite references HSP2results directories, those are from the tests/convert manual testing pattern. Since this tool is a drop-in replacement for HSPF, we should maybe try to migrate away from having some files for the HSPF run and other duplicated files for the HSP2 run. That's the approach the test suite takes, there's only one test case, and it runs in both HSPF and HSP2 exactly identically. If we duplicate the files and run separate tests for each, we can't say that anymore.

Looping back on performance, I ran the test10 with both the 'develop' branch and this one and timings look good.

under develop we have:

pytest -k test10- --durations 0
================ slowest durations =====================
54.69s call     tests/test_regression.py::test_case[test10-0]
28.54s call     tests/test_regression.py::test_case[test10-3]
28.32s call     tests/test_regression.py::test_case[test10-1]
28.30s call     tests/test_regression.py::test_case[test10-4]
28.18s call     tests/test_regression.py::test_case[test10-2]

under this branch we have:

pytest -k test10- --durations 0
================ slowest durations =====================
43.56s call     tests/test_regression.py::test_case[test10-0]
28.48s call     tests/test_regression.py::test_case[test10-2]
27.35s call     tests/test_regression.py::test_case[test10-4]
26.98s call     tests/test_regression.py::test_case[test10-3]
26.27s call     tests/test_regression.py::test_case[test10-1]

the first call includes compiling and caching the numba jit code, so we can ignore it.

kudos @rburghol for getting this huge feature implemented without a performance hit.

Note that because of all the disk io writing to H5 files, the numba compiled code (28 seconds avg) is not much faster than pure python code (32 seconds avg):

NUMBA_DISABLE_JIT=1 pytest -k test10- --durations 0

36.20s call     tests/test_regression.py::test_case[test10-4]
33.02s call     tests/test_regression.py::test_case[test10-0]
32.83s call     tests/test_regression.py::test_case[test10-3]
32.15s call     tests/test_regression.py::test_case[test10-1]
31.91s call     tests/test_regression.py::test_case[test10-2]

I think we should focus our attention there next.

rburghol commented 1 month ago

Thanks @austinorr for all your guidance and lift on this! Thanks @PaulDudaRESPEC !! Stoked to try and up the performance level in the next phase.