cess_date refactoring - Githubissues

aaron-kaplan commented 1 year ago

I eliminated duplication by folding the initialization into the loop. I also split cess_date into two functions cess_date_from_sm and cess_date_from_rain, in the hopes of making them easier for the programmer to understand.

Seeing the result of these changes, I'm happy with two functions instead of one, but I'm not sure about eliminating the duplication. Is this version harder to understand?

remicousin commented 1 year ago

(my comment belonged here really)

ok... so first thing first, I am cool with the folding of step 0 into the loop. I don't know what I struggle so much with broadcasting...

Then, functions defined in functions and functions as arguments of functions is new paradigm to me... so it opens doors (once I clear up the fog). Then I have comments as I am processing this.

One is that I am not clear what's at stake in having 2 functions of one function if "if" checks in it. To me, cess_date is really a function of sm but one could not have sm but could derive sm from rain. But we don't want to do that because both cess_date(sm) and sm(rain) involve loops and we don't want to have to run one full loop for sm first, saving many things we will eventually discard, then run the loop again for cess_date (and not mentioning that now we are really talking about the one year (or season) case and that we want to run multiples years at once at some point soon).

But, now that I perceive the concept of functions as arguments of functions a bit, we could have once cess_date function that takes daily_sm as mandatory input. We however allow daily_sm to be stated as None, in which case the user would specify a function as optional argument to compute sm from. But we still need to pass as argument a whole (in time) array of rain. Can we pass it water_balance (not water_balance_step that ignores time) with reduce=True, sminit that updates every round of the loop with previous sm, the full daily_rain, et, taw...?

That would make things probably look complicated to the user, compared to 2 distinct functions. It would allow modularity for the user to plug something else for the sm_func. Knowing that at present we are working with the "simple" water_balance but at some point we'll have to replace it with the more complete one (currently in agronomy.py).

I guess what I come down to is whether it's worth to have one master function where the user is able to choose what input function to use to produce sm... the difficulty being that those input functions don't have the same inputs themselves than the simple case of the master function (they should also all produce same outputs but that seems less difficult to comply with). Instead of spawning functions as new ways of computing sm come through (But maybe I am extrapolating too much: we do have 2 ways currently but one is a special simpler case of the other)...

I'll leave at that for now to see what that triggers in your brains.

aaron-kaplan commented 1 year ago

I like having separate functions because it allows us to declare what arguments are allowed/required in each case simply by the function signatures, i.e. it's easy to see that cess_date_from_rain accepts et, taw, and sminit, and cess_date_from_sm doesn't, just by reading the function signatures. When they were combined into one function, we needed extra documentation to explain when to provide those arguments and when not to, and we needed an extra check in the code to catch when the function is called with an inappropriate combination of arguments.

Allowing the user to specify a different sm function is a good argument for having a general function that takes a sm function as one of its parameters. That function already exists in my implementation: it's called _cess_date. If we decide to let users call that function, we should remove the leading underscore from its name (by convention the underscore means private, not for use outside of this package).

Using functions as function arguments is often useful, but Kevin and I have deliberately avoided using that paradigm in the pingrid/maproom APIs, because we know it's unfamiliar to a lot of scientific programmers. We agreed that it was ok to do internally, but not in functions meant for use by others. Making _cess_date public would violate that rule. It's just a guideline, open to discussion and experimentation.

Making _cess_date public doesn't preclude keeping the special-case functions cess_date_from_rain and cess_date_from_sm. Functions like these are called "convenience functions." They provide an easier way of doing something that you could also do without them.

I agree that logically cess date is just a function of sm, and it would be nice if the implementation of cess_date didn't have to care whether sm were provided directly or calculated from rainfall. I argued for interleaving the sm calculation with the cess date calculation because the simple approach was too slow, but to be honest it's still too slow now, so maybe that sacrifice wasn't worth it. I started working on the performance issue last week, and will try to continue that this week.

In the meantime I suggest you move ahead with the maproom implementation, even though we haven't finalized the API. I don't think it will be a lot of work to adjust to any API changes we make later, and it would be useful to know whether the current speed is unacceptable or just annoying.

iridl / python-maprooms

cess_date refactoring #371