pydata / patsy

Describing statistical models in Python using symbolic formulas
Other
955 stars 104 forks source link

Zero inflation covariates. #205

Closed Bastien-mva closed 4 months ago

Bastien-mva commented 9 months ago

Hello,

Thanks for this great package. I wondered if you had any way to specify zero inflation covariates, like the r formula: "y ~x1 + x2| x1", and x1 is used for covariates only on the zero inflation model. I have been trying it with patsy but I get an error patsy.PatsyError: Error evaluating factor: TypeError: unsupported operand type(s) for |: 'str' and 'str' counts ~1 + tree | tree

Thanks in advance,

Bastien

tomicapretto commented 9 months ago

Hi @Bastien-mva!

I think handling zero-inflation is more in the realm of the libraries built on top of design matrix builders. For example, if developers of library decide to support some custom syntax to specify covariates used for certain parts of a model, it's up to those devs to implement what's needed to support those custom cases.

Model formulas are extremely flexible and that means the same operator does not mean the same in all cases. The pipe operator | is a great example. In lme4 (in R) and Bambi (in Python) it's used for specifying random effects. In what you mention, it is used to specify zero-inflation. And in other libraries I've seen other use cases.

Oh, and finally, patsy is no longer under active development. I recommend you have a look at Formulaic