Open szs8 opened 11 years ago
I guess I can use ModelDesc etc. https://patsy.readthedocs.org/en/latest/expert-model-specification.html
But in any case it might make sense to fail gracefully here.
Huh, fair enough, the parse evaluator does recurse over the parse tree. It hadn't occurred to me that people would want to parse strings with hundreds of terms :-).
I'll think about how fixable that is. In the main time you may prefer in any case to use the programmatic interface for constructing formulas, which bypasses the string parser entirely. See http://patsy.readthedocs.org/en/latest/expert-model-specification.html and in particular the paragraph starting "However, there is also a middle ground...".
In your case I'd do something like
from patsy import ModelDesc, Term, LookupFactor
my_formula = ModelDesc([], [Term(LookupFactor(c)) for c in df.columns]) dmatrix(my_formula, df)
Let me know how it goes, there might be other places where I didn't think scaling through far enough...
On Fri, Apr 12, 2013 at 2:04 PM, NaN notifications@github.com wrote:
I am working with a dataframe which has 7000 columns and it turns out that once you go beyond 485 terms, patsy throws a recursion error when going from a formula to a design matrix. Is there a better way of doing this?
Thanks!
In [282]: df = pd.DataFrame(dict(('a' + str(i), np.random.randn(5)) for i in xrange(500))) In [283]: formula = " + ".join(df.columns) In [284]: dmatrices(formula, df) .... /Users/xxx/lib/python2.7/site-packages/patsy-0.1.0_dev-py2.7.egg/patsy/desc.pyc in eval(self, tree, require_evalexpr) 452 "'%s' operator" % (tree.type,), 453 tree.token)--> 454 result = self._evaluators[key](self, tree) 455 if require_evalexpr and not isinstance(result, IntermediateExpr): 456 if isinstance(result, ModelDesc): /Users/xxx/lib/python2.7/site-packages/patsy-0.1.0_dev-py2.7.egg/patsy/desc.pyc in _eval_binary_plus(evaluator, tree) 283 284 def _eval_binary_plus(evaluator, tree):--> 285 left_expr = evaluator.eval(tree.args[0]) 286 if tree.args[1].type == "ZERO": 287 return IntermediateExpr(False, None, True, left_expr.terms) /Users/xxx/lib/python2.7/site-packages/patsy-0.1.0_dev-py2.7.egg/patsy/desc.pyc in eval(self, tree, require_evalexpr) 452 "'%s' operator" % (tree.type,), 453 tree.token)--> 454 result = self._evaluators[key](self, tree) 455 if require_evalexpr and not isinstance(result, IntermediateExpr): 456 if isinstance(result, ModelDesc): /Users/xxx/lib/python2.7/site-packages/patsy-0.1.0_dev-py2.7.egg/patsy/desc.pyc in _eval_binary_plus(evaluator, tree) 283 284 def _eval_binary_plus(evaluator, tree):--> 285 left_expr = evaluator.eval(tree.args[0]) 286 if tree.args[1].type == "ZERO": 287 return IntermediateExpr(False, None, True, left_expr.terms) /Users/xxx/lib/python2.7/site-packages/patsy-0.1.0_dev-py2.7.egg/patsy/desc.pyc in eval(self, tree, require_evalexpr) 448 assert isinstance(tree, ParseNode) 449 key = (tree.type, len(tree.args))--> 450 if key not in self._evaluators: 451 raise PatsyError("I don't know how to evaluate this " 452 "'%s' operator" % (tree.type,), RuntimeError: maximum recursion depth exceeded in cmp
— Reply to this email directly or view it on GitHubhttps://github.com/pydata/patsy/issues/18 .
Thanks, I was just about to do that and was trying to figure out how categorical columns would be handled especially if they are integers.
I guess I should have never attempted to build such a huge formula anyways but sometimes you are pig headed and just want to plough forward!
@signalseeker, I recently ran into the same error using statsmodels to build a logistic regression with more than 485 predictors. The data I'm working with has a very large predictor space and, unfortunately, there is nothing to be done about it. Thanks for looking into this, @njsmith.
+1. Trying to run an interaction model '(a1+ a2+ ... a360) * (b1+...b40)'
works, but '(a1+ a2+ ... a500) * (b1+...b40)'
breaks :-(
Have to resort to sklearn.preprocessing.PolynomialFeatures
@DSLituiev: so as noted upthread, you can use something like (untested)
from patsy import ModelDesc, Term, LookupFactor
terms = []
for i in range(1, 501):
for j in range(1, 41):
# Add an interaction between a{i} and b{j}, like a10:b12
terms.append(Term((LookupFactor("a" + str(i)), LookupFactor("b" + str(j))))
preparsed_formula = ModelDesc([], terms)
dmatrix(preparsed_formula, dataframe)
This gives you exactly the same thing as the patsy formulas you wrote above; it's just that instead of having to generate a big string and then have patsy parse it, you can go directly to patsy's high-level representation of your data structures.
(And if you want to transform individual items before passing them in, you can replace LookupFactor(...)
with something like EvalFactor("np.log(x)")
or EvalFactor("C(a10)")
, or you can even define a custom factor class -- mostly you just need to implement an eval
method that takes a dataframe and returns your factor's values.)
That said, I'm not likely to find the time to fix this soon, but it certainly is fixable by replacing the current recursive loop with an equivalent non-recursive loop, and I'd be happy to accept a patch if anyone wants to make one.
Thank you! this looks like enough for my application, and I am afraid I am not sufficiently equipped to tinker the source for now.
On Thu, May 19, 2016 at 7:20 PM, Nathaniel J. Smith < notifications@github.com> wrote:
That said, I'm not likely to find the time to fix this soon, but it certainly is fixable by replacing the current recursive loop with an equivalent non-recursive loop, and I'd be happy to accept a patch if anyone wants to make one.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pydata/patsy/issues/18#issuecomment-220501580
Was this ever fixed?
@jolespin I don't think so.
I tried doing a mixed effects model with 4000 attributes and it kind of just got stuck and my computer stopped sounding like it was computing anything. Is there a maximum number attributes that can go in a linear model?
@jolespin This issue is about formulas with lots of terms, like "y ~ x1 + x2 + x3 + x4 + x5 + x6 + ........ + x3999 + x4000"
, and it causes crashes, not freezes. Your issue sounds like something you should report to the package you're using to do mixed effect models (maybe statsmodels?)
I got the same issue. Any update on this?
At this point, we are unlikely to fix this in patsy. The issue is resolved in Formulaic
, however.
I am working with a dataframe which has 7000 columns and it turns out that once you go beyond 485 terms, patsy throws a recursion error when going from a formula to a design matrix. Is there a better way of doing this?
Thanks!