Closed rchui closed 1 year ago
Hi again @rchui ,
Thanks for thinking through the obstacles that one might face when migrating on to formulaic
from patsy
; and this is not an unreasonable request. I'm also midway through improving documentation, and adding an explicit guide on migrating from patsy. I'm definitely not a big fan of the Q('col name')
syntax, though (and once added, I'd need to support it forever!); so I want to be thoughtful in how I approach this.
Technically, we'd implement this as a stateful transform, and surface it just as patsy does (as a function that's available in the formula namespace). Thus, a formula like: Q('wacky name!')
would generate a term labelled Q('wacky name!')
, and the Q
transform would just select out the appropriate column from the data/context. No actual conversion to the new syntax would take place, we'd just support both.
I guess the question is how useful this would be, and whether it is worth supporting forever these older formula grammars. I think there are three different ways we could take this:
Q
method (and perhaps some other things) to the formula grammar.dmatrix
-like API... but this is likely not worth it.What are your thoughts here?
At first glance, I'm not sure this would be the approach that I would use. My inclination would be an API that extends the Formula
class with a classmethod and have it apply the stateful transform under the covers. ie:
import formulaic as fm
fm.Formula.from_patsy(...).get_model_matrix(...)
This way this isn't isn't in your "mainline" logic path and it forces the user to be intentional about deriving a formulaic formula from a patsy formula. Using a from_*
is a well used pattern in the Python space and I think many would find it natural when coming from pandas, etc.
I've updated the linked PR and brought it in as a standard patsy-compat transform. Rather than bifurcating formulaic into two different formula languages, it now just has explicit patsy shims, which people will likely migrate from in time.
Thanks for reaching out about this! And let me know if this doesn't work for you once the PR is merged!
The formula grammar for patsy and formulaic are close but not 1:1. It would be great if terms like
Q('...')
could automatically be converted into`...`
, etc. This would help ease the adoption curve to migrate from patsy to formulaic using existing formulae.