iza-institute-of-labor-economics / gettsim

The GErman Taxes and Transfers SIMulator
https://gettsim.readthedocs.io/
GNU Affero General Public License v3.0
55 stars 32 forks source link

BUG: Replace `_tu` grouping #690

Closed MImmesberger closed 7 months ago

MImmesberger commented 7 months ago

Bug description

The grouping _tu should be replaced by the correct grouping (_sn, _bg, _fg, or _hh).

To get rid of the _tu groupings, we need the following new groups:

This affects the following taxes and transfers

eink_st_y_tu

soli_st_y_tu

abgelt_st_y_tu

Kinderbonus

Lohnsteuer

Freibeträge

Vorsorgeaufwand

zu_verst_eink

Arbeitslosengeld

Elterngeld

Erziehungsgeld

Grundrente

Grundsicherung im Alter

Kindergeld

Unterhaltsvorschuss

Wohngeld

ALG II

Kinderzuschlag

MImmesberger commented 7 months ago

Hopefully this is all correct, but let me know if something seems off @hmgaudecker

I'll update the list if something comes up.

hmgaudecker commented 7 months ago

kinderbonus_m_tu changes to kinderbonus_m_sn

I think that should just follow kindergeld

MImmesberger commented 7 months ago

The PR for this likely closes #683, #670 Potentially also #270

Related to #606

hmgaudecker commented 7 months ago

anz_kinder_mit_kindergeld_tu changes to number of Kindergeld claims on _sn level

Does Kindergeld or Kinderfreibetrag matter here or both, @JakobWegmann? In any case, it should be the versions at individual level, if we have them.

hmgaudecker commented 7 months ago

Elterngeld

  • Net income calculation should be based on lohnst_m, not eink_st_y_tu
  • Note: Even then the calculation is slightly off compared with the simplified rules described here. Might be something for a different PR.

Yeah. Big one. I still believe this should go altogether and become an input we require from prior calculations. Typically $t-1$. In any case, don't worry much about it in this PR.

hmgaudecker commented 7 months ago

Grundsicherung im Alter Income checks on the _tu level should be replaced with _sn. But this is just an approximation because i) there might be couples that do not file taxes together but are considered as a couple for Grundsicherung, ii) income of the partner is not considered if the partner cannot satisfy her own needs.

I wonder whether we should support another grouping, like "married" or "einstandspartner" ?

Since 2020, income of children is not considered, even if it is higher than 100.000 € (S. 5 §43 SGB XII).

Yeah, I think we can ignore that until somebody actually needs it. But maybe add a note in the function?

MImmesberger commented 7 months ago

I wonder whether we should support another grouping, like "married" or "einstandspartner" ?

I agree, especially as we have p_id_einstandspartner as an input variable already. Just learned, that this would be the correct grouping for Grundrente as well (doesn't matter if taxes are filed jointly).

hmgaudecker commented 7 months ago

alleinerz_tu should change to alleinerz_sn and can be determined endogenously.

Leave as is for now (AFAICT, alleinerz is the input variable), make endogenous in a different PR.

JakobWegmann commented 7 months ago

anz_kinder_mit_kindergeld_tu changes to number of Kindergeld claims on _sn level

Does Kindergeld or Kinderfreibetrag matter here or both, @JakobWegmann? In any case, it should be the versions at individual level, if we have them.

I'm 98% sure that only the Kinderfreibetrag matters. I think the law is very clear.

JakobWegmann commented 7 months ago

Elterngeld

  • Net income calculation should be based on lohnst_m, not eink_st_y_tu
  • Note: Even then the calculation is slightly off compared with the simplified rules described here. Might be something for a different PR.

Yeah. Big one. I still believe this should go altogether and become an input we require from prior calculations. Typically t−1. In any case, don't worry much about it in this PR.

The parental leave calculation in my Stata code matches the calculation, so I think I can at least fix the calculation in GETTSIM as soon as all these changes are implemented.

hmgaudecker commented 7 months ago

Elterngeld

  • Net income calculation should be based on lohnst_m, not eink_st_y_tu
  • Note: Even then the calculation is slightly off compared with the simplified rules described here. Might be something for a different PR.

Yeah. Big one. I still believe this should go altogether and become an input we require from prior calculations. Typically t−1. In any case, don't worry much about it in this PR.

The parental leave calculation in my Stata code matches the calculation, so I think I can at least fix the calculation in GETTSIM as soon as all these changes are implemented.

Yes, but given our annual structure, I suppose the better approximation is not to use concurrent income, right?

JakobWegmann commented 7 months ago

Yes, I agree it would be more intuitive and less error prone to have an additional input.

hmgaudecker commented 7 months ago

It should be an output, though! But not in this PR, @MImmesberger, can you open an issue for that, please?

MImmesberger commented 7 months ago

It should be an output, though!

Just to make sure I understood that correctly: it should be an output for convenience (when calculating the inputs for t in t-1), i.e. it would not be used as an input for another function?

hmgaudecker commented 7 months ago

Not in the concurrent year, no. But in the subsequent year.

Say I have panel data for 2023 and 2024 and I want to calculate Elterngeld for kids born in 2024. This would proceed as follows:

  1. Run GETTSIM on 2023 data, output elterngeld_eink_relev_current_m .
  2. Call this variable elterngeld_eink_relev_lag_m and run GETTSIM on 2024 data using it as an input.

We'll need to have a suitable distinction of the names, this suggestion is bogus, of course.

ChristianZimpelmann commented 7 months ago

Great! This looks like a big step forward!

I wonder whether we should support another grouping, like "married" or "einstandspartner" ?

I agree, especially as we have p_id_einstandspartner as an input variable already. Just learned, that this would be the correct grouping for Grundrente as well (doesn't matter if taxes are filed jointly).

AFAIK, unmarried partners are considered for Grundsicherung im Alter, but not Grundrente. So I guess we needed three groupings (which should be all available from the input data):

MImmesberger commented 7 months ago

(I updated the original post following our discussion. Also, I added the two new groupings that are needed)

MImmesberger commented 7 months ago

For ALG2 (_bg grouping), the Wohngeld priority check should still be on the _hh level for Wohngeld, correct? I'm referring to this function:

def wohngeld_vorrang_hh(
    wohngeld_nach_vermög_check_m_hh: float,
    arbeitsl_geld_2_vor_vorrang_m_bg: float,
) -> bool:
    """Check if housing benefit has priority.

    Parameters
    ----------
    wohngeld_nach_vermög_check_m_hh
        See :func:`wohngeld_nach_vermög_check_m_hh`.
    arbeitsl_geld_2_vor_vorrang_m_bg
        See :func:`arbeitsl_geld_2_vor_vorrang_m_bg`.

    Returns
    -------

    """
    return wohngeld_nach_vermög_check_m_hh >= arbeitsl_geld_2_vor_vorrang_m_bg
hmgaudecker commented 7 months ago

For ALG2 (_bg grouping), the Wohngeld priority check should still be on the _hh level for Wohngeld, correct?

Not quite. It is possible that some individuals in a household receive Wohngeld and others receive Bürgergeld.

To my understanding, Wohngeld is calculated at the household level but can be broken down to individual values. For the priority check, these should be aggregated at the _bg level and compared there. Does that seem correct, @mjbloemer @michaelhebsaker ? Would you have a reference of how to distribute Wohngeld across household members?

lars-reimann commented 7 months ago

A married indicator would be useful as well.

Should this be called married or verheirat? I'd wager this should go into demographic_vars.py, right?

hmgaudecker commented 7 months ago

A married indicator would be useful as well.

Should this be called married or verheirat? I'd wager this should go into demographic_vars.py, right?

I think I'd use ehe_id. Short enough and very clear.

lars-reimann commented 7 months ago

So, ehe_id instead of spouse_id and no additional boolean married variable?

hmgaudecker commented 7 months ago

Ah, sorry, I had not read through @MImmesberger's updates to the main issue. Yes, I think ehe is clearer, also since we are using German identifiers for the other groupings.

Instead of an extra indicator we can check for p_id_ehepartner >= 0, right? Should be easy enough, rather not add an extra variable.

lars-reimann commented 7 months ago

Instead of an extra indicator we can check for p_id_ehepartner >= 0, right? Should be easy enough, rather not add an extra variable.

Yes, that was the implementation of married anyway. I'll remove it again.