Closed bonjourmauko closed 1 month ago
Thanks @maukoquiroga.
This is marked as depending on #1033 but I see redundant changes, for example in the config file.
I would be in favour of having a single PR as they seem to address the same issue (https://github.com/openfisca/openfisca-doc/issues/244) and to improve documentation altogether. If you'd prefer not to for some reason, could we at least update the target branch to #1033? 🙂 The current changesets seem likely to yield conflicts.
This is marked as depending on #1033 but I see redundant changes, for example in the config file.
Corrected it ~to depend on #1038~ to depend on both #1038 and #1033, and changed base for clarity as suggested ~, the redundant changes are effectively just in the setup and config files, the rest has to be properly rebased as they're independent from #1033 (I will do shortly)~.
If you'd prefer not to for some reason, could we at least update the target branch to #1033? 🙂 The current changesets seem likely to yield conflicts.
I will rebase properly, that should get us rid of the conflicts 😃
I would be in favour of having a single PR as they seem to address the same issue (openfisca/openfisca-doc#244) and to improve documentation altogether.
I thought so initially as well, and I tried, then I decided to split this work by "submodule" and add doc as well as fixing the doctests. Here's what happened:
I tried to fix all doctests at once, which I couldn't for several reasons:
Holder
for example involves almost half of the codebase, including the Holder
itself…)Then I adopted the current attack strategy so to:
I think we probably won't be able to fix all the doctests soon, that's why I changed from an horizontal to a vertical (submodules) approach.
Happy to have an extended discussion on that with the community if you think that could be useful!
@benjello I'd like to have your opinion on this one.
There is no breaking-change, but a proposal of a more explicit interface between entities and variables.
However, there are still three things that I don't understand:
Anyway, I tried to make the one-to-many relationship more explicit by introducing Entity.variables
, as there is already Variable.entity
.
Same for group entities and roles: if I understood correctly, there can be as many roles a there are entities inside a group entity, but that relationship is implicit as there is not actual relationship between entities and group entities.
Finally, there are the famous sub-roles, which are just plain roles, added recursively to a group entity, as group_entity.SUB_ROLE
, instead of for example `group_entity.roles.get("FIRST_PARENT"). Why?
@maukoquiroga:
variables belongs to an entity (they are a characteristic of an entity). Entities help you describe the model. Populations are the actual "holders" of the data.
I do not understand what do you mean by entities inside a group entity. There is an atomic entity (persons) and a group entity can have many person and as meany roles as there are different persons in those entities.
IIRC sub_roles are convenient for initialisation but also here for legacy purpose. Sorry I do not recall well where they are still used (taking a look at France may give you hints).
@maukoquiroga:
- variables belongs to an entity (they are a characteristic of an entity). Entities help you describe the model. Populations are the actual "holders" of the data.
Great! So:
Is that it?
- I do not understand what do you mean by entities inside a group entity. There is an atomic entity (persons) and a group entity can have many person and as meany roles as there are different persons in those entities.
Ah OK so roles actually relate to concrete data holders (Person does not have a role, Alicia
has).
- IIRC sub_roles are convenient for initialisation but also here for legacy purpose. Sorry I do not recall well where they are still used (taking a look at France may give you hints).
Sure.
- variables belongs to an entity (they are a characteristic of an entity). Entities help you describe the model. Populations are the actual "holders" of the data.
Great! So:
Yep
- Entity 1 - * Population (so for example Alicia is a population, a concrete (data holder) member of a Person).
- Entity 1 - * Variable (a person can have a salary, pay a tax and not another, be born) [the abstract rule]
- Population - Variable (Alicia, Jorge, ... have x salary, pay the y tax, were born at specific instant).
Is that it?
Yes popualtions are the "simulation" side of the "model" (tax-benefit-system).
- I do not understand what do you mean by entities inside a group entity. There is an atomic entity (persons) and a group entity can have many person and as meany roles as there are different persons in those entities.
Ah OK so roles actually relate to concrete data holders (Person does not have a role,
Alicia
has).But Household do have roles,
household_head
,child
etc ? Roles are part of of the structure of a GroupEntity. And thus every Person population has a role.
@benjello So conceptually, the Population
is the actual Holder
of the data (concrete Entity). Holder then is just a caching system, or should be, isn't it?
@benjello So conceptually, the
Population
is the actualHolder
of the data (concrete Entity). Holder then is just a caching system, or should be, isn't it?
@maukoquiroga Holder
is more the concrete Variable than the concrete Entity. And thus is definitively a caching system.
@MattiSG @sandcha @benjello ?
I am in favor of more typing, more doctest but I am ,ot sure I can grasp all the implications of the changes ...
The added documentation is great! However, I don't understand the consequences of introducing an abstract base class system
Thanks @MattiSG !
Concerning the abstract class system, it is not one. As I've described it:
This PR consolidates the notion of protocols, or duck-typing, or structural sub-typing: that means type-checks are done not against the actual implementation of a model, but against a protocol, that is, the equivalent of an interface, or an abstract model, but without impact at runtime.
Structural sub-typing is described in PEP 544.
Quoting it:
At runtime, protocol classes will be simple ABCs. There is no intent to provide sophisticated runtime instance and class checks against protocol classes. (...) Protocols are completely optional:
- No runtime semantics will be imposed for variables or parameters annotated with a protocol class.
- Any checks will be performed only by third-party type checkers and other tools.
In other words, there are no consequences beyond type-checks —no runtime impact then.
To make the point, an Entity
is to an EntityProtocol
the same a List
is to a Sequence
: instead of type-checking for implementation, we type-check for behaviour (that is why it is nicknames "duck-typing").
This has, however, one major benefit: we remove circular module dependencies from the codebase, which has an actual positive impact in terms of modularity and testability.
and this “variable proxy” system, so I cannot approve that PR 😕
That is just an abstraction, or in DDD, a port.
As I see it has two impacts:
Beyond the "design" aspect of it, the actual recipe, the descriptor, is just standard Python —a huge deal of the function system, properties, and so on, are just syntactic sugar; behind-the-scenes they're just descriptors.
Finally, as you can see from the changeset, the introduced syntax is more coherent with the projector
system used everywhere in OpenFisca, which should, sooner rather than later, IMHO, be reimplemented as a descriptor as well.
Superseded by #1255
Part of #1061 Superseded by #1220 Superseded by #1252 Superseded by #1255
Documentation
Doctests
Typing
typing
, under three categories:types
: for data-types, which are subsequently split into two sub-categories:XType
: an actual data-type —for example an arrayXLike
: an object where the actual data-type is irrelevant, but that can be coerced toX
.protocols
: for behaviours —think interfaces but for the sole purpose of static type-checkingschemas
: type-safe data-objects —mostly for dictionariesDeprecations
Entity.set_tax_benefit_system
: now provided by a standardproperty
setter.Entity.check_role_validity
: moved to helpers as it had nothing to do with entitiesNew features
entity 1 - * variable
declarative, thanks to the_VariableProxy
descriptordataclass
andslots
for reduced boilerplate and improved performanceTechnical changes
GroupEntity
, replaces dynamicrole
attributes assignment for an explicit caching mechanismslots
and thus the performance gainsNotes
Entity
andPopulation
,Simulation
andSimulationBuilder
, and so on. Although the changset is backward-compatible, future contributions enforcing this choice may introduce breaking changes.adapter
,proxy
,port
, orrepo
: beyond the pure relationship between models, the behavioural logic between them is extracted to specialised models —in this case,querying
variables is extracted fromEntity
to a_VariableProxy
. This is just a subset of 1.protocols
, orduck-typing
, orstructural sub-typing
: that means type-checks are done not against the actual implementation of a model, but against aprotocol
, that is, the equivalent of aninterface
, or anabstract model
, but without impact at runtime.schemas
. But, contrary to regularschemas
used to validate, serialise, and deserialise data from/to models, these schemas are purely type-safe data objects, meant for type-checks, without impact at runtime.Extended note on data-schemas
What if, instead of just using
schemas
as typed-dicts for type-checks, we used them at runtime to compulsorily:a. check types at runtime? b. validate data at runtime? c. serialise/deserialise data —as suggested in #1071 ? d. enforce contracts, or domain logic —both in terms of data ("has to be >0") and representation ("tojson()")?
For
Formula
andTest
.OpenAPI
Against
Performance: whereas declarative data transmutation can even increase performance for individual situations —WebAPI—, a naive or out-of-the-box implementation (or anything with a complexity of at least
O(n)
) will certainly have a very penalising impact on performance for large population simulations.This due to several reasons:
numpy
already provides data type-casting, with configurable levels of safety, and C-optimisedPossible workarounds:
--safe
or--strict