Closed MattiSG closed 7 years ago
reference
.Other question raised by @michelbl: what about the XML comments? See openfisca/openfisca-france#766.
How to handle <PLACEHOLDER>
?
Workshop results.
My vote: 1B-2B.
How to handle
<PLACEHOLDER>
?
A proposition :
# bouclier_fiscal.yaml
taux:
description: Bouclier fiscal
type: percentage
values:
2011-01-01:
value: 0.5
reference: http://example.org
2014-01-01: {}
My vote: 1B-2B.
@benjello 2B means 1 big yaml and no filesystem hierarchy, it differs with the consensus "Represent hierarchy in the filesystem." :thinking:
I am in favor of a middle ground: use the filesystem hierarchy until you arrive to the "prestation level". It does have a lot of value to be able to see all the parameters concerning one prestation in one place but isolated from the whole set of parameters.
I propose that nesting is hierarchy is loaded indifferently from the filesystem and the YAML inner nesting, with no arbitrary limitation. We can recommend (and give only examples) to use a file for each parameter, or at least to have it contain only a set of nodes, but leave it to the user to know what is the most readable. I would tend to consider this nesting choice as an editorial element, on which the writer is trusted to do its best to be readable.
For PLACEHOLDER
, what about using a placeholder
or expected
subkey in place of value
?
@benjello, does it happen often that one has a strong idea of what the next value will be, but cannot be 100% sure until the law is published? For example, is that likely?
taux:
description: Bouclier fiscal
type: percentage
values:
2017-01-01:
value: 0.5
reference: http://example.org
2018-01-01:
expected: 0.6
Yes such situation can happen (and might had existed in parameters file). And it is definitively better to distinguish an expected value from a real value.
Yes such situation can happen
Thanks! My question was not very clear: how frequently can one give an expected value vs simply expecting the next update date?
No that doesn't happen very often. Sometimes some evolution are announced but the real value is the one indicated in a subsequent décret or law and it may differ.
For pension legislation, the situation may appear more often.
Great, thank you for these details.
I thus suggest the following format:
taux:
description: Bouclier fiscal
type: percentage
values:
2017-01-01:
value: 0.5
reference: http://example.org
2018-01-01:
expected: 0.6 # if the value can be predicted
2019-01-01:
expected # leave empty if the value cannot be predicted
In this implementation, the following cases must be handled:
{ "2019-01-01" : "expected" }
.2019-01-01: expected
. This is standard YAML and should be transparent after parsing, and will be parsed as { "2019-01-01" : "expected" }
.2019-01-01: expected:
. This will be parsed as { "2019-01-01" : { "expected": null } }
.@michelbl Would sometimes getting an object and sometimes getting a string be annoying?
We still need to create the equivalence table between the current type
+ format
and the new type
attribute. Does anyone have the list of the current couples?
We still need to create the equivalence table between the current type + format and the new type attribute. Does anyone have the list of the current couples?
The only documentation is the code : https://github.com/openfisca/openfisca-core/blob/master/openfisca_core/legislationsxml.py#L11 https://github.com/openfisca/openfisca-core/blob/master/openfisca_core/legislationsxml.py#L76
type -> unit
format -> format
*
-> *
There are other transformations, for example :
I believe this is the opportunity to agree on a common terminology between the YAML representation and the in-memory representation. This terminology would be in English and consistent with the use of capital letters. The parsing of YAML files would involve validation against a json schema and the minimal set of transformations.
@michelbl Would sometimes getting an object and sometimes getting a string be annoying?
No, this is not annoying.
I think that during parsing, "expected" tags are kept (with or without a value). This intermediate in-memory representation can either be reduced to a given instant (to be used in formulas) or be searched for "expected" tags. Indeed, using a regex to find these tags will not be as easy with YAML as it is with XML.
using a regex to find these tags will not be as easy with YAML as it is with XML.
Why so? Identifying all expected
leaves or properties sounds rather easy.
type -> unit format -> format
Here is the list of all values and their repartition:
4 format: bool
401 format: float
152 format: integer
952 format: percent
25 type: age
1 type: days
1 type: hours
496 type: monetary
1 type: months
_(obtained via xsltproc extract-formats.xsl openfisca_france/parameters/*.xml | sort | uniq -c
, the XSLT stylesheet being here; I checked results count consistency with grep format | wc -l
)_
I thus see that the types days
, hours
and months
are used each only once.
I think this leads us to decide whether type information, which is a recurrent documentary need, should be described as normalised metadata or be put in the description.
Here is the current usage:
<CODE code="TP_nbh" description="Nombre d'heure du temps plein" format="integer" origin="openfisca" type="hours">
<VALUE deb="2002-01-01" valeur="1820" />
</CODE>
<CODE code="TP_nbj" description="Nombre de jour du temps plein" format="integer" origin="openfisca" type="days">
<VALUE deb="2002-01-01" valeur="360" />
</CODE>
As one can see, both description and metadata is used, and I would see a benefit in including the type in the variable name itself.
As a maintainer, I don't see the value of the type
attribute in this context, as I already know the value is in hours from the description (and, possibly, from the variable name).
As a reuser, I could see the value of being able to suffix the value with a unit to present it to the user. However, maintaining a mapping of types to suffices sounds like a never-ending task to me.
What is the current use of the format
? Is it simply exposed to the outside as part of documentation, or does it trigger some special handling, such as dividing a percent
by 100?
does it trigger some special handling, such as dividing a percent by 100?
I'm pretty sure it doesn't. The percent
parameters values are already divided by 100 in the XML files:
<CODE code="coeff_enfant_supplementaire" description="Coefficient à rajouter aux plafonds pour chaque enfant à charge" format="percent" origin="openfisca">
<VALUE deb="2014-01-01" valeur="0.3"/>
</CODE>
I think the format and type metadata are only documentation, but @michelbl probably has a better knowledge of this part of the code.
Historically format
is information of about how to display the valeur
. It doesn't trigger any computation.
I think the format and type metadata are only documentation,
I think type
is only documentation. The value monetary
may be used somewhere to pretty print a parameter : xxx €
format
is used by the legislation parser to know if a parameter should be parsed as a float, an int or a boolean. However YAML knows about booleans, integers and floats, so format
will not be used for that with YAML.
It seems to me that the only use of type
and format
is to specify the formatting to use when printing the parameter. Then only the format
metadata should be kept. It's would be optional and its possible values could be monetary
and percent
. However this issue is common with variables (which can be percents, amounts, ages... and have to be printed at some point)...
After a discussion with @fpagnoux we suggest two solutions :
1) Drop format
and type
.
2) Merge format
and type
to a single field. Make it optional and allow values :
year
or age
or duration_in_years
rate
or percent
(but we think percent
is bad since 0.3 can be 0.3% or 30%)monetary
or monetary_unit
Please vote @MattiSG @benjello @sandcha @Anna-Livia and any user/contributor of openfisca !
It seems to me that you should also add duration in hours, duration in months .
If we're only talking about metadata, we will never encompass all possible units, as the comment above shows.
To elaborate on @fpagnoux & @michelbl's proposition:
unit
attribute is added, to be consumed by clients, documented as being safely appendable to the output.type="monetary"
attribute pairs are replaced by unit="€"
(or unit="FRF"
where applicable).type
attribute are renamed as unit
attributes, and their values are translated to the country package language and set to singular (type="days"
→ unit="jour"
etc).type="age"
instances are renamed as unit
attributes and their values renamed to the country package language in the most appropriate form for a suffixable unit.format="percent"
attribute pairs are replaced by unit="%"
.format
values (bool
, float
, integer
) are removed altogether.The format="percent" attribute pairs are replaced by unit="%".
To me there is a big risk of confusion. If I see:
taux:
description: Bouclier fiscal
unit: %
values:
2017-01-01:
value: 0.5
I really understand that the value is 0.5%
, while it is 50%
!
For rate parameters, alternatives suggestions:
format="percent"
altogether.unit: null
, or unit: ~
to explicit the fact that they have no unit.add
unit: null
, orunit: ~
to explicit the fact that they have no unit
As stated above, I think unit
should be optional. No unit
means no unit :wink:
What about multiplying all percentages by 100
and changing their value on load? Do we have some preprocessing already in place?
Okay, let's not get stuck with that part.
I propose with go forward with replacing all format="percent"
by unit="/1"
. This looks a bit silly, but allows us to postpone that specific decision while retaining the information. We'll see later on if we want to drop the information or add some preprocessing to store percentages.
The description should always include something like “Taux de” (“Rate of ”) to be clearer about the content.
There is some information that should not be lost. I thought of ratio_point
or ratio_unit
but never found a good name ...
In openfisca-france, some scales switch from french franc to the euro in 2001 (https://github.com/openfisca/openfisca-france/blob/master/openfisca_france/parameters/impot_revenu.xml#L185) while some other switch in 2002 (https://github.com/openfisca/openfisca-france/blob/master/openfisca_france/parameters/prelevements_sociaux.xml#L1767)
Since this issue is complex, it is not not part of this issue and supporting several currencies has never been stated as a goal of openfisca, I prefer unit = currency
instead of unit = "€"
/unit = "FRF"
The LinearAverageRateTaxScale
class cannot be created from valid XML parameters : https://github.com/openfisca/openfisca-core/blob/master/openfisca_core/legislations.py#L268. It must be created using a custom legislation subtree using modify_legislation_json
. It is used in only one case : https://github.com/openfisca/openfisca-france/blob/master/openfisca_france/reforms/landais_piketty_saez.py#L85
Since this feature is not documented and used only in one place, I am willing to either break the landais_piketty_saez reform in openfisca-france, or patch it using standard openfisca tools.
The current parameter syntax is very heavy, due to its format being XML. Its vocabulary is poorly readable and made mostly of French words and abbreviations. Some attributes are inconsistent.
Example:
This proposal aims at making it much easier to contribute to the parameters, by:
1 - references
There is currently no mechanism for storing value-level references. This is obviously making some critical information miss, and leads to dirty workarounds (see https://github.com/openfisca/openfisca-france/issues/766) and lost contributors (see https://github.com/openfisca/openfisca-france/pull/728#issuecomment-304023834).
Proposition A
Proposition B
2 - names
(assuming format 1B for references)
Proposition A
Inline hierarchy in the ID:
Proposition B
Keep the current hierarchy:
Means there would be a few reserved names in order to identify leaves (such as
description
,type
,values
).Proposition C
Rely on filesystem hierarchy:
Open questions
conflicts
attributes used for? Can they be dropped? If not, how to represent them?origin
attributes be absorbed in thereference
attributes? I.e. can an opaque identifier be systematically superseded by an URI? (I frame it this way to show that it is definitely technically feasible, and that the question is purely semantic)type
+format
attributes couple and the newtype
attribute? Is it bijective?