openfisca / openfisca-core

OpenFisca core engine. See other repositories for countries-specific code & data.
https://openfisca.org
GNU Affero General Public License v3.0
168 stars 75 forks source link

Exception when getting a parameter value older than its start date #368

Closed fpagnoux closed 8 years ago

fpagnoux commented 8 years ago

Right now, if I try to get a parameter before the debvalue, I get an error : AttributeError: 'CompactNode' object has no attribute 'xxx'

I think it's a new behaviour, as I'm running into errors I did not have before. For instance, test_basics.pydoesn't pass anymore for me on master. For flexibility reasons, I'd prefer if openfisca did not crash and instead returned the oldest value it could find.

laem commented 8 years ago

As I understand it :

To correct incoherences like openfisca/openfisca-france#353 (some values being extended in the future, some not depending on the param.xml root node), issue openfisca/openfisca-france#366 was raised by @cbenz .

The idea is to enable the extension of value nodes, without erasing param.xml knowledge.

It was implemented by @eraviart (e98fc00a969f1d235fc5f8c4eb8cf04267195b76 and subsequent commits on -france and 31634e425ca8bf51b32ce8b6fd7db8adb1698aa4 on -core), but by not specifying the fin attribute instead of writing fin="unknown".

So this :

 <CODE code="IM_max" description="Indice majoré plafond" format="integer">
        <VALUE deb="2006-01-01" fin="2014-12-31" valeur="717" />
 </CODE>

source

becomes this :

<CODE code="IM_max" description="Indice majoré plafond" format="integer">
        <VALUE deb="2015-01-01" fuzzy="true" valeur="717" />
        <VALUE deb="2006-01-01" fin="2014-12-31" valeur="717" />
</CODE>

source

fuzzy explicits the fact that we don't really know if and when the value is going to be changed.

So back to this issue, three solutions :

1) Implement the same behavior for the oldest value by adding new fuzzy nodes. It is coherent but we'll end up with this everywhere, which I find quite heavy for a probably not so common use case :

<CODE code="IM_max" description="Indice majoré plafond" format="integer">
  <VALUE deb="2015-01-01" fuzzy="true"      valeur="717" />
  <VALUE deb="2006-01-01" fin="2014-12-31"  valeur="717" />
  <VALUE fuzzy="true"     fin="2005-12-31"  valeur="717" />
</CODE>

2) Find the real first values and update param.xml. Tests fail for at least rafp_plaf_assiette, IM_max, tns and micro_entreprise.

3) As you say @fpagnoux, return the oldest value found.

benjello commented 8 years ago

@laem : really good summary ! @fpagnoux : some of these errors arise because we use pre-2006 values which were never entered. So you can as @laem said, extends those values (you can find IM_max here and some others here. I didn't find the time to look at those during vacation and we should be able to avoid extending values in the past.

laem commented 8 years ago

@benjello Shouldn't this issue be left open ? The master branch has many failing tests.

benjello commented 8 years ago

Yes sorry

cbenz commented 8 years ago

Quick and easy fix : https://github.com/openfisca/openfisca-france/commit/7b881be35aebe830eadad64b3af102479ac2141d

cbenz commented 8 years ago

Now the hard way is to retrieve and enter these values in param.xml, but that's not urgent.

laem commented 8 years ago

/home/mama/dev/sgmap/openfisca-france/openfisca_france/tests/formulas/af.yaml is still failing when make test is run

cbenz commented 8 years ago

Now the problem is that other tests than test_basics.py fail when simulations try to access parameter values older than its start date.

See https://travis-ci.org/openfisca/openfisca-france/builds/100387893

cbenz commented 8 years ago

For information I'm starting to fix the tests in a more efficient way: gather (thanks to @benjello links) and set the real values!

benjello commented 8 years ago

@laem started it and it passes the tests. Try to revert at max the date range to revert at max the quick fix (after commiting a working version) if it is ok with you since some people urgently needs a working version ;-)

cbenz commented 8 years ago

All right, thanks @laem !

cbenz commented 8 years ago

@benjello Do you have any clue on where to find tns.micro_entreprise parameters in baremes IPP?

I hoped to find tns.micro_entreprise.abattement_forfaitaire_fp.achat_revente for example (failing in test_basics.py) in a YAML correspondence file between IPP and OpenFisca names, but I didn't find it in https://git.framasoft.org/groups/french-tax-and-benefit-tables repos.

Any clue?

laem commented 8 years ago

Seems to be there : Code général des impôts, Article 50-0

benjello commented 8 years ago

You can have a look here but i spotted that there were problems when I started the translation. Also a missing year in the ipp tables. We will review that very soon (next week). So use the values and open issue both on openfisca and ipp-tables to say that we should check that particular point.

benjello commented 8 years ago

Look at the micro tab

cbenz commented 8 years ago

@laem thanks, by the way how did you find the legifrance article? By search engine or metadata I didn't see ?

laem commented 8 years ago

Usually I search the web to find the article or law, and then go back in time with legifrance, but it's sometimes really hard :confused:

cbenz commented 8 years ago

There are too many missing values and it's too heavy for me. So I changed my strategy: I copy the first value in the past starting at 2000-01-01 and ending the day before the first real value.

See https://github.com/openfisca/openfisca-france/commit/d69406771b421480f505a006a98df71d03972805 (this is a WIP in a branch)

benjello commented 8 years ago

It is ok for me if it passes the tests. Do not spend too much time on it

MattiSG commented 8 years ago

Frankly, this sounds like a bad idea. Why add knowingly wrong data rather than making the software support the behaviour you're making manually? The resource investment will be comparable, one won't have issues if one tries to simulate before 2000, and it will be easier to detect where to do historical research if needed.

cbenz commented 8 years ago

The software was precisely modified recently to raise an exception when a value does not exist in the past (or future) (see https://github.com/openfisca/openfisca-france/issues/366). Previously it returned the earliest/latest value prolonged in the past/future, with a subtle-and-hard-to-understand rule related to the whole legislation start/stop dates (which were deleted).

After having spend one day trying to find real values I switched to this is a workaround in order to make the tests pass. These are marked fuzzy and I created a new issue to fix them.

MattiSG commented 8 years ago

Yes, I had understood this. I'm just saying that you're reproducing the previous behaviour, with a twist. First, it's manual, thus much less efficient. Second, it has cases (simulation < 2000-01-01 or after the last end date) that will lead to crashes and have severe consequences in production.

Simply printing a warning when a value has been inferred has the same diagnosis value as adding fuzzy (even more, since you get warned only about what you use for a specific simulation, thus easing prioritisation), while increasing resilience and avoiding tedious work. Creating an issue that basically says “spend 2 months full time searching throughout law to find historical values for all parameters” doesn't fix this IMO :wink:

Now if you really want to do it this way, do it, but please bear in mind that this kind of change is making it ever harder to contribute. What's the point of improvements such as considering switching to YAML if you then make it mandatory to add a confusingly meaningless data point because the software isn't able to infer stuff on its own and inform the user about it? Simplicity is key to decrease contributor friction, and the proposed change is putting the burden on the contributor (or the gatekeeper, and we all seem pretty busy already, don't we?) rather than on the software.

cbenz commented 8 years ago

Finally, due to the huge quantity of missing parameters, I abandon adding fuzzy to the parameters in the past.

The aim of the modification of the Core was to make it the more explicit as possible. We really want an error to be returned in case a formula requests a parameter which does not exist at all, or at a specific instant. The inference mechanism could be an option to the simulation which would apply to each calculate calls and trigger warnings as you said instead of errors.

This error is an exception for the moment (AttributeError: 'CompactNode' object has no attribute 'xxx'). It would be nice if the web API returned this error in the JSON response and avoid an error 500. To be implemented in the web API code a priori. This would ease the contribution loopback with a message like: "Formula "xxx" requested parameter x.y.z at period P which doesn't exist.". And perhaps "Click here to add the value" :)

But in the Python use case (scripts, notebooks), the economists want to have the error first and have the choice to put a fuzzy attribute explicitely in the XML.

Note that there is a performance issue with the CompactNode class which attributes are directly stored in the __dict__. Overloading the __getattribute__ magic method would be performance killer. To be tested.

cbenz commented 8 years ago

So now that the tests pass, and that the title of the issue is a design choice, I close this issue.

We can continue on other issues if wished.

fpagnoux commented 8 years ago

So to be clear : let's say I introduce today a new variable that is based on a new legislative parameter, and that I only know the value for 2016. To be able to run a simulation in 2014 without raising an error, do I have to add a fake values for 2014 ? If so, this is clearly making things harder, especially with our outside contributors (the 'CompactNode' object has no attribute 'xxx' are non explicit errors that caused us a fair amount of problems with Paris).

We really want an error to be returned in case a formula requests a parameter which does not exist [...] at a specific instant.

Based on the mes-aides usage of Openfisca, I don't share and clearly disagree with this statement. If this is a hard constraint from IPP, and if a warning is really not an acceptable compromise, then a flexibility point must be implemented. It would be clearly too heavy to add a parameter to each calculate. A parameter at the simulation level in the api seems more realistic, but it would limit mutualization if the code written for mes-aides raises error when run within an IPP simulation. In conclusion, this constraint seems to be very strong and to come with clear drawbacks that must be taken into account.

benjello commented 8 years ago

You can use a DatedVariable or the start argument of the variable (i it is still used). Or do not use a start date for the oldest parameter. Or add a fuzzy line for the oldest parameter.