openfisca / openfisca-core

OpenFisca core engine. See other repositories for countries-specific code & data.
https://openfisca.org
GNU Affero General Public License v3.0
169 stars 75 forks source link

Stop assuming that vectorial computation can be hidden from reusers that have to read or write formulas, and focus on making it understandable instead #673

Open MattiSG opened 6 years ago

MattiSG commented 6 years ago

Current assumptions

Traditionally, vectorial computations have been hidden from newcomers, in the hope (https://github.com/openfisca/openfisca-core/issues/651#issuecomment-391152789) that users would not need to grasp the complexity up front, and that they could be confronted to it later.

How this is problematic

  1. All hackathons and IRL trainings I have observed have always shown confusion from reusers and unease from trainers around introduction of vectorial computation concepts.
  2. Recent onboarding experiences with Italy and Aotearoa New Zealand prove (https://github.com/openfisca/openfisca-core/issues/651#issuecomment-391221423, https://github.com/ServiceInnovationLab/openfisca-aotearoa/pull/15#discussion_r189493644) that reusers are actually confronted very soon to the need to read or write formulas, and that if they need to do so they also need to understand that something more is going on.
  3. IRL training of @verban by @MattiSG took 1h30m delay because of lack of references to NumPy helpers in the doc.

Experiments that prove it is the good direction

  1. A recent rewrite (https://github.com/openfisca/openfisca-doc/pull/134) of the vectorial computation pages documentation renamed them from “limitations” to a more “standard practice” vocabulary. The utility of the result has been validated by user testing in hackathon (https://github.com/openfisca/openfisca-core/issues/651#issuecomment-391221423), in IRL training and through reviews (https://github.com/openfisca/openfisca-doc/pull/134#issuecomment-389799276).
  2. Aotearoa has trialled (https://github.com/openfisca/openfisca-core/issues/651#issuecomment-391210325) using a pluralised argument name for its variables. The response is positive, especially in IRL training (cc @br3nda).

Concrete steps that would be taken

Estimated impact

MattiSG commented 6 years ago

I'm wondering if this should include deprecating formula helpers and replacing them with NumPy snippets? I see only 3 there, and they are not documented (https://github.com/openfisca/openfisca-doc/issues/4).

benjello commented 6 years ago

I am not so sure that pluralizing first argument will help a lot when the vector nature of the argument is made clearer and emphasized at the beginning of the training. Using singular for both the entity and the first parameter helps a lot. If not you end up asking yourself should I use a plural here or a singular, am I dealing with the entity or the vector etc. From my experience, when the vector thing is well understood, the question of the use of plural vs singular changes in nature.

bonjourmauko commented 6 years ago

Hi @MattiSG this is IMHO a good direction to move forward 😃. I've had experienced this issue myself.

Country Template: pluralise first argument name in all formulas, using the pluralised name of the entity. Doc: pluralise first argument name in all formulas, using the pluralised name of the entity. Doc: document the recommendation to use the pluralised name of the entity as the first formula parameter.

I've passed quite a lot of time of my performance improvement efforts trying to understand the nature of the receiving arguments. Given the lack of native type check in Python (at least < 3.6), it can be a bit cumbersome.

I think naming should more or less reflect the duck-typing of the argument. Whether it is a list, tuple, set or a numpy.ndarray, I think argument name should be pluralised. Even if argument is an empty collection or if it has just one element.

(Note: it goes beyond the current RFC, but arguments, if not optional, should respect duck-typing, i.e. not passing None where a list is expected).

I am not so sure that pluralizing first argument will help a lot when the vector nature of the argument is made clearer and emphasized at the beginning of the training.

I see two other arguments for this:

  1. It is easier IMHO to foster contribution if the fact that we're dealing with a vector is self-evident. We can only do a limited amount of training, for the rest we rely on the doc, the code and the tests.

  2. It is way easier to refactor and to improve code when we know more or less the signature of functions and their returning type. If I can see I'm dealing with vectors, I'll adapt my refactoring approach immediately.

Core: always import numpy rather than import numpy as np to increase discoverability of that library.

Not sure about this one, as all code snippets I've seen so far in the internet use

import numpy as np
Morendil commented 5 years ago

Closing as stale. This might well still be one of the core issues in OF, and some ideas are starting to emerge for addressing it from a different perspective (e.g. creating more affordances for directing computation based on conditions, which would eliminate the NumPy-idiomatic "logic multiply" operator in favor of something more salient and documented, as well as afford a large performance boost if initial trials prove a reliable indication).

bonjourmauko commented 4 years ago

I'm reopening this issue as I'd like to arrive to a consensus on this. I'll probably split it into several other issues to have more targeted discussions.