openfisca / openfisca-core

OpenFisca core engine. See other repositories for countries-specific code & data.
https://openfisca.org
GNU Affero General Public License v3.0
168 stars 75 forks source link

Variable tags #1122

Open nikhilwoodruff opened 2 years ago

nikhilwoodruff commented 2 years ago

It would be really useful to be able to assign a set of tags to variables, for example an benefit variable might have the tags: `["benefit", "means-tested"]. This would enable applications which use APIs of country models (for example, PolicyEngine's API explorer) to filter by tags, and I think it would be very low-maintenance to implement (I'd be happy to). Would welcome any feedback (cc @benjello, @sandcha) - I'm planning to add this as a monkey-patch to the UK and US systems (e.g. like we have with other metadata fields), but it would probably be nicer to be able to have a standard interface with other country models.

benjello commented 2 years ago

Cc @MattiSG

MattiSG commented 2 years ago

I understand this as a specific case of https://github.com/openfisca/openfisca-core/issues/1071. @nikhilwoodruff please correct me if I got it wrong!

Specifying this does not solve anything (even less so because I'm very conscious of how #1071 is currently stuck), but referencing this here aims at helping with centralising the discussion ๐Ÿ™‚ It is useful to have this issue as a way to distinguish between use cases.

In order to understand the value for the wider community, could you please elaborate a bit more on the end user use case you have? What is the aim of consuming these tags? Does it help with sorting in a UI? What does your example means-tested tag mean? ๐Ÿ™‚

nikhilwoodruff commented 2 years ago

Thanks @MattiSG - yes, I can see why that issue got a bit more complex: perhaps it might be easier to separate the question of metadata which informs some additional computation (e.g. the min attribute implies some extra validation), versus metadata which only aims to better organise data.

Yes, I agree it's a specific case of #1071 - but I think this issue falls in the second case, in that it wouldn't need any extra validation or computation (except maybe to test that tags is of type List[str]?).

The main use case here is to enable users of the API to more easily find or explore the set of variables, without knowing their names or having to look through the source code. For example, let's say I want to find the set of variables which are involved in computing a particular benefit (e.g. Universal Credit in the UK). If I could filter variables to those including the Universal Credit tag, that'd speed up the process - much like how on GitHub, I can filter all issues/PRs to those including the kind:solution tag. means-tested is one tag we might use - it refers to any transfer to households or individuals which is reduced with income - certain benefits and certain tax allowances.

MattiSG commented 2 years ago

enable users of the API to more easily find or explore the set of variables

Gotcha! ๐Ÿ‘Œ

I believe @openfisca/france-contrib implement this by using long variable names, akin to namespacing (except that it is not really since there are no โ€œnamespacesโ€). In your case, it would work with prefixing your sets of variables with universal_credit_ or uc_. I am not saying this solution is the best, I'm just sharing it to illustrate how it currently is. This way does have some value in the sense that tagging would not prevent collisions: if you for example have an eligible variable tagged with universal-credit, if would not prevent its name from colliding with eligible tagged with child-care-benefit, so you will always end up having to distinguish them somehow anyway, and I assume the additional information needed for that is likely to be comparable to what you would tag the variable with. Please correct me if that last assumption is wrong!

I do see the added value of exposing more information to reusers for discovery. Would variable namespacing also fit the bill for you? Do you see many cases where you would like to add several tags? Please keep this issue updated with the results of your monkey-patched experimentation, it's a great way to accumulate examples and decide to go towards normalisation based on them! ๐Ÿ˜ƒ