Closed matthewwardrop closed 1 year ago
Patch coverage: 100.00
% and no project coverage change.
Comparison is base (
c9daec8
) 100.00% compared to head (b8dbe4e
) 100.00%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
This patch has now been refined and just needs some unit testing to merge. I'm not expecting big changes now, unless I get feedback from folks.
>>> import pandas
>>> from formulaic import Formula
>>> df = pandas.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
>>> ms = Formula("a + b + a:bs(b) + C(a, contr.treatment)").get_model_matrix(df).model_spec
>>> ms.term_variables
OrderedDict([(1, set()),
(a, {'a'}),
(b, {'b'}),
(C(a, contr.treatment), {'C', 'a', 'contr.treatment'}),
(a:bs(b), {'a', 'b', 'bs'})])
>>> ms.variable_terms
{'a': {C(a, contr.treatment), a, a:bs(b)},
'b': {b, a:bs(b)},
'C': {C(a, contr.treatment)},
'contr.treatment': {C(a, contr.treatment)},
'bs': {a:bs(b)}}
>>> ms.variable_indices
{'a': [1, 3, 4, 5, 6, 7],
'b': [2, 5, 6, 7],
'C': [3, 4],
'contr.treatment': [3, 4],
'bs': [5, 6, 7]}
>>> ms.variables
{'C', 'a', 'b', 'bs', 'contr.treatment'}
>>> ms.variables_by_source
{'data': {'a', 'b'}, 'transforms': {'C', 'bs', 'contr.treatment'}}
As per #32 and #60, it is sometimes useful to be able to look up which variables were used by the formula and its terms. This can be used to slice columns, or to determine data requirements for subsequent model evaluations.
This patchset adds support for tracking this information during model matric materialization, which is exposed as attributes of the
ModelSpec
; that is:closes: #32 closes: #60