GETTSIM's interface is hard to use and inflexible. We discussed the direction we should be heading at the 2024 workshop; this issue summarizes the discussion and opens it for community members who were not present.
GETTSIM's approach of being precise and requiring all relevant inputs means that simple questions can only be answered either with a huge amount of input data or with detailed knowledge of the DAG (e.g., how to cut off the pensions calculations part of the DAG when we have a sample of 30-year olds and ignore DI).
Proposed solution
Provide a repo in the ecosystem with default values for broadly defined "personas" (low-income households, rich people, families with children, pensioners, ...)
Provide an interactive interfaces for selecting subgraphs (more below)
Ease of use with different datasets
Problem
It is very hard to see how GETTSIM's requirements match what is in the data. For any reasonable application, the graph is far too fine-grained. It is very hard to see what kinds of intermediate nodes there might be in the data already
Proposed solution
Simple stuff: A higher-level view of the graph (just the upper-level keys in the hierarchical structure, like arbeitslosengeld_2, rentenversicherung, ... width of edges mirrors the amount of links between individual functions, click on an area opens the internal structure), coloring of nodes following the upper-level keys.
Iterative protocol for setting up the graph
User specifies the targets
load policy functions (potentially including a reform)
Select input variables (either just take all root nodes or interactive process outlined below)
Use that directly if names in the data match, or provide a dictionary/yaml file (depending on calling language) with all required inputs, user can fill in values with the corresponding names in her dataset or a default value that will apply to all observations
Multiple calls of GETTSIM, potentially changing the length of the data, using data inputs for take-up
Problem
Some stuff needs to be done outside the graph (e.g., Günstigerprüfung with endogenous bg_ids). Sometimes, we want to use information from the data in order to calculate things (e.g., whether someone who has a Minijob contributes voluntarily to the pension insurance or not).
Proposed solution
Set up sparse dataset with duplicated observations (0/1 take up in the above example; no concern if Minijob is irrelevant), potentially call GETTSIM on subsets of the data in sequence. (needs more thought)
Interactive interface
Notebook or other dashboard-style UI
Depict graph (colors / simplifications as above)
Clicking on a node means that it will become a root node provided via the data. All nodes that are not necessary anymore disappear / are greyed out. Clicking on that node again makes it an endogenous node again. "Absolute" root nodes (say, gender) do not require explicit selection
There is a button to export all root nodes, which generates the template for mapping to data columns / the list of data columns.
Is your feature request related to a problem?
GETTSIM's interface is hard to use and inflexible. We discussed the direction we should be heading at the 2024 workshop; this issue summarizes the discussion and opens it for community members who were not present.
Several related issues:
378
449
531
532
538 and #667
640
Ease of use for newcomers
Problem
GETTSIM's approach of being precise and requiring all relevant inputs means that simple questions can only be answered either with a huge amount of input data or with detailed knowledge of the DAG (e.g., how to cut off the pensions calculations part of the DAG when we have a sample of 30-year olds and ignore DI).
Proposed solution
Ease of use with different datasets
Problem
It is very hard to see how GETTSIM's requirements match what is in the data. For any reasonable application, the graph is far too fine-grained. It is very hard to see what kinds of intermediate nodes there might be in the data already
Proposed solution
arbeitslosengeld_2
,rentenversicherung
, ... width of edges mirrors the amount of links between individual functions, click on an area opens the internal structure), coloring of nodes following the upper-level keys.Multiple calls of GETTSIM, potentially changing the length of the data, using data inputs for take-up
Problem
Some stuff needs to be done outside the graph (e.g., Günstigerprüfung with endogenous
bg_id
s). Sometimes, we want to use information from the data in order to calculate things (e.g., whether someone who has a Minijob contributes voluntarily to the pension insurance or not).Proposed solution
Set up sparse dataset with duplicated observations (0/1 take up in the above example; no concern if Minijob is irrelevant), potentially call GETTSIM on subsets of the data in sequence. (needs more thought)
Interactive interface