Open bashtage opened 15 hours ago
I couldn't match patsy in all cases. I think this should get it. I am 100% not wed to the name, and happy to change to anything that makes sense. I sort of liked "sort-groups", but don't really like hypens. Another though was "degree-only", but same problem. I can match patsy in both but not using the same _ordering
.
This is the problem that I had with the existing ordering methods
formula = "TOTEMP ~ GNPDEFL + GNP + UNEMP + ARMED + POP + YEAR"
patsy
Variable names
['Intercept', 'GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP', 'YEAR']
formulaic, _ordering: sort
Variable names
['Intercept', 'ARMED', 'GNP', 'GNPDEFL', 'POP', 'UNEMP', 'YEAR']
formulaic, _ordering: degree
Variable names
['Intercept', 'GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP', 'YEAR']
formulaic, _ordering: none
Variable names
['Intercept', 'GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP', 'YEAR']
formula = "deaths ~ logpyears + smokes + C(agecat)"
patsy
Variable names
['Intercept', 'C(agecat)[T.2]', 'C(agecat)[T.3]', 'C(agecat)[T.4]', 'C(agecat)[T.5]', 'logpyears', 'smokes']
formulaic, _ordering: none
Variable names
['Intercept', 'logpyears', 'smokes', 'C(agecat)[T.2]', 'C(agecat)[T.3]', 'C(agecat)[T.4]', 'C(agecat)[T.5]']
formulaic, _ordering: sort
Variable names
['Intercept', 'C(agecat)[T.2]', 'C(agecat)[T.3]', 'C(agecat)[T.4]', 'C(agecat)[T.5]', 'logpyears', 'smokes']
formulaic, _ordering: degree
Variable names
['Intercept', 'logpyears', 'smokes', 'C(agecat)[T.2]', 'C(agecat)[T.3]', 'C(agecat)[T.4]', 'C(agecat)[T.5]']
Converted to a draft as this doesn't solve my problem. I need to look a bit deeper as to how patsy orders variables, especially with respect to categoricals.
I've looked into this a bit today and it seems that it isn't really possible to acieve patsy's ordering in the current structure where _ordering
is part of the formula. Patsy know what type of variable each variable is when decides order. This is how it can reliable order the intercept, then categoricals (incl interactions, by degree order), then continuous variables (incl dummy variable interactions, again by degree-order).
Any ideas of how we could try to address this, even if it was in statsmodels? Could reorder variables in a rendered model somehow, adn then rerender if the order changes?
Add ordering method that sorts by degree but not by variable name Add test for method