QST: help with understanding how income splitting works in gettsim

git-girl commented 6 months ago

hey there,
i wanted to check against gettsim that an income tax calculator i wrote isnt producing complete nonsense / that i am using it correctly. i am writing a paper on income splitting and wanted to simulate a system with income splitting and one without it. however i am confused how to work with gettsim here. from my understanding:

whether to apply income splitting to a household is done via the household being in the same tax unit and then internally anz_erwachsene_tu is used to implement income splitting in src/_gettsim/taxes/eink_st.py:26.

i did the basic thing with creating the params, functions and datasets calculating them once and i got the income tax 4998.0. then i went to:

update tu_id for p_id == 0 (so the husband)
saw that multiple tu_ids in a household are not supported so i updated that as wel

however calling calculate on that gave me this:		hh_id	tu_id	p_id	hh_typ	weiblich	bruttolohn_m	eink_st_tu	eink_selbst_m
0	1	1	0	single_1_children	False	2000	2376	434.116	False
1	0	0	1	single_1_children	True	1000	37	102.725	False
2	0	0	2	single_1_children	False	0	37	0	False

this shows that while all relevant input is the same the couple pays more through joint taxation (i also checked things in debug the both adults have practically no diff by default as i am sure you are aware)

from my understanding of income splitting this just cannot be the case, as the effect of income splitting is a perfect maximization of the total household income tax progression and usage of their tax exempt amounts.

My Question

could you please help me understand where i am going wrong as in what parts of gettsim or income splitting?

best regards!

### My Code

```python policy_params, policy_functions = set_up_policy_environment(2016) data = create_synthetic_data( n_adults=2, n_children=1, specs_constant_over_households={"bruttolohn_m": [2000.0, 1000.0, 0.0]}, ) print(data.columns) targets = [ "eink_st_tu", "anz_erwachsene_tu", "lohnst_m" ] debug = True result = compute_taxes_and_transfers( data=data, functions=policy_functions, params=policy_params, targets=targets, debug=debug ) result.round(2) print(result) data.loc[data["p_id"] == 0, ["tu_id", "hh_id"]] = 1 data["hh_typ"] = "single_1_children" result = compute_taxes_and_transfers( data=data, functions=policy_functions, params=policy_params, targets=targets, debug=debug ) result.round(2) print(result) print(result[[ "hh_id", "tu_id", "p_id", "hh_typ", "weiblich", "bruttolohn_m", "eink_st_tu", "lohnst_m", "sonstig_eink_m", "sonstig_eink_m", "eink_vermietung_m", "eink_selbst_m", "alleinerz" ]].to_markdown()) ```

MImmesberger commented 6 months ago

Hey! That's true, the sum of income taxes should be lower under joint taxation.

I have the feeling that this is a related to a bug on our side (see issue #683). When I use the code you provided the column anz_erwachsene_tu is a boolean, but should be an int (number of adults in the tax unit). This column is used several times in the income tax calculation.

When I specify anz_erwachsene_tu manually in your code example (see below), I get correct values (verified with the calculator by the Ministry of Finance).

Thank you for bringing this to our attention again! We're currently in the middle of getting rid of tax units (#694) so this will be fixed very soon.

Edit: Just for completeness. With the fix, I get 2057€ under joint and 2376€+37€ under individual taxation.

policy_params, policy_functions = set_up_policy_environment(2016)

data = create_synthetic_data(
    n_adults=2,
    n_children=0,
    specs_constant_over_households={"bruttolohn_m": [2000.0, 1000.0]},
)

data["anz_erwachsene_tu"] = [2, 2]

print(data.columns)

targets = [ "eink_st_tu", "lohnst_m", "zu_verst_eink_tu"]
debug = True

result = compute_taxes_and_transfers(
    data=data,
    functions=policy_functions,
    columns_overriding_functions=["anz_erwachsene_tu"],
    params=policy_params,
    targets=targets,
    debug=debug
)
result.round(2)
print(result)

git-girl commented 6 months ago

@MImmesberger thanks for you fast replies and the help! could you maybe say how you are specifying the individual taxation :grimacing:? i thought i am setting it by doing this:

data.loc[data["p_id"] == 0, ["tu_id", "hh_id"]] = 1

i somehow still get 216Eur and 0 for the individual taxation with my setup though

MImmesberger commented 6 months ago

Your approach is quick and dirty but should work. It's a bad approximation for families with children, though. There might be more problems but that's what directly comes to my mind.

I can't reproduce the 216€ you get. Here is the slightly adjusted version of your original code that I used:

Code example

```python from gettsim import ( set_up_policy_environment, create_synthetic_data, compute_taxes_and_transfers, ) policy_params, policy_functions = set_up_policy_environment(2016) data = create_synthetic_data( n_adults=2, n_children=1, specs_constant_over_households={"bruttolohn_m": [2000.0, 1000.0, 0.0]}, ) data["anz_erwachsene_tu"] = [2, 2, 2] print(data.columns) targets = [ "eink_st_tu", "lohnst_m", "zu_verst_eink_tu"] debug = True result = compute_taxes_and_transfers( data=data, functions=policy_functions, columns_overriding_functions=["anz_erwachsene_tu"], params=policy_params, targets=targets, debug=debug ) result.round(2) print(result) data.loc[data["p_id"] == 0, ["tu_id", "hh_id"]] = 1 data["hh_typ"] = "single_1_children" data["anz_erwachsene_tu"] = [1, 1, 1] result = compute_taxes_and_transfers( data=data, functions=policy_functions, columns_overriding_functions=["anz_erwachsene_tu"], params=policy_params, targets=targets, debug=debug ) result.round(2) print(result) print(result[[ "hh_id", "tu_id", "p_id", "hh_typ", "weiblich", "bruttolohn_m", "eink_st_tu", "lohnst_m", "sonstig_eink_m", "sonstig_eink_m", "eink_vermietung_m", "eink_selbst_m", "alleinerz" ]].to_markdown()) ```

If you want exact results, you would have to adjust the functions that GETTSIM uses and feed them into compute_taxes_and_transfers via columns_overriding_functions. This is necessary because splitting up tax units breaks the functionality of some tax deductions (e.g. tax deductions for the parent that is in the new tax unit are too low). See this part of the tutorial.

hmgaudecker commented 6 months ago

Thanks for your interest and the good question!

For a current workaround, see #518. Undocumented and won't be merged, I am afraid.

As @MImmesberger wrote, the whole thing should be supported in main within weeks, so if you have that much time, I'd just wait.

git-girl commented 6 months ago

omg i just overlooked setting data["anz_erwachsene_tu"] = 1, thanks! :>

from my side this issue can be closed if you don't want to leave it open because of the anz_erwachsene_tu thing.

edit: and if you ever are interested in the way i implemented the income taxation, it's a tool that transpiles the xml specification of the ministry of finances tax calculator into python using numpy. the python output isn't pretty and does still require some knowledge of the xml spec, but maybe it can be of help cross referencing some behavior. the source is here: https://codeberg.org/git_girl/german_tax_sim

MImmesberger commented 6 months ago

Thanks!

iza-institute-of-labor-economics / gettsim

QST: help with understanding how income splitting works in gettsim #693

My Question