iza-institute-of-labor-economics / gettsim

The GErman Taxes and Transfers SIMulator
https://gettsim.readthedocs.io/
GNU Affero General Public License v3.0
55 stars 33 forks source link

Treatment of helper functions #333

Open ChristianZimpelmann opened 2 years ago

ChristianZimpelmann commented 2 years ago

Current and desired situation

Most functions that are defined in the several function modules are written such that they are called using the DAG.

Other functions ("helper functions") are, however, called from within other functions directly (e.g. _st_tarif or _berechne_vorsorge_bis_2004).

For those functions, the used arguments are not restricted to names of other functions and input variabels (as it is the case for all other functions). #314 introduces a test that checks this restriction (test_all_arguments_inputs_params_function_names). It currently ignores all functions starting with an underscore. The same PR also made sure that all "helper functions" start with an underscore. This seems a good fix for now, but is not optimal, especially when the number of internal functions for which this restriction should be checked increases.

AFAIK, when building the DAG, we do not differentiate between the typical functions and helper functions.

Proposed implementation

Move all helper functions to a separate module (or several separate modules) which is/are not part of PATHS_TO_INTERNAL_FUNCTIONS.

Considered alternatives

Make a naming convention to identify functions that should not be part of the DAG.

hmgaudecker commented 2 years ago

Could we take stock so we know how many we are talking about? The two examples are quite different:

More generally, I am not sure about the use of test_all_arguments_inputs_params_function_names. Is this not automatically checked as part of the DAG generation? That is, does the test not simply add a maintenance burden by trying to replicate the DAG-generation to some extent?

ChristianZimpelmann commented 2 years ago

Those two are indeed currently the only ones. But I could imagine that it might sometimes be helpful to have these helper functions to follow the DRY principle. I suggest we leave it as it is right now and discuss it at some later point when we have a bigger discussion about the structure of the repo.

More generally, I am not sure about the use of test_all_arguments_inputs_params_function_names. Is this not automatically checked as part of the DAG generation?

The thing that this test checks in addition, is that all input variables are documented in TYPES_INPUT_VARIABLES. In line with this insight, I changed the name to test_all_input_vars_documented and moved it to test_docs.py.

(btw, the docstring is still taken directly from IZAΨMOD, I believe, and should be changed / deleted)

Done

hmgaudecker commented 2 years ago

I am not exactly sure what the issue here is in the end or what we make of it when we revisit in a few weeks/months.

If the presence of these functions in the files does not create a major nuisance / maintenance burden, I would leave it as is. There is a large value in keeping them close to the substantively related code, see the discussion the other day where @mjbloemer (rightfully so) did not find the calculation of the kinderzuschl_max parameter.

I think I would just close this one.

ChristianZimpelmann commented 2 years ago

I agree that it would be beneficial to keep it in the same module.

However, it is an inconsistency in our code base that we have functions that cannot be a target of the DAG, but are in no way separated from other functions. I would therefore like to leave the issue open (with priority-low) until the next brainstorming session.

hmgaudecker commented 2 years ago

The DAG generation sees the functions and decides it does not need them. I do not really see the issue, tbh, but don't have a problem with keeping it open.

hmgaudecker commented 2 years ago

Crept up again while @JakobWegmann was working on #150 -- see Zulip discussion. This is really confusing if not deeply familiar with the DAG mechanics, we'll need to find a better way of describing how things work / change the system.

I'd be leaning towards naming conventions right now.

lars-reimann commented 1 year ago

Instead of a naming convention, helper functions could also be marked with a @helper decorator to exclude them from the DAG.

hmgaudecker commented 1 year ago

Yes, thanks, decorators ftw!