h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

maximum recursion depth error when using `isin` in h2o python #12046

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

{code:java} import h2o h2o.init() x = h2o.import_file('/Users/nidhimehta/Downloads/dummyt.txt')

tt = x['rr'].unique() gg = tt[:10000, 0]

x[~x['rr'].isin(gg['C1'].ascharacter().as_data_frame()['C1'].tolist())]

{code}

{code:java}

RecursionError Traceback (most recent call last) /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/core/formatters.py in call(self, obj) 670 type_pprinters=self.type_printers, 671 deferred_pprinters=self.deferred_printers) --> 672 printer.pretty(obj) 673 printer.flush() 674 return stream.getvalue()

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj) 381 if callable(meth): 382 return meth(obj, self, cycle) --> 383 return _default_pprint(obj, self, cycle) 384 finally: 385 self.end_group()

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle) 501 if _safe_getattr(klass, 'repr', None) not in _baseclassreprs: 502 # A user-provided repr. Find newlines and replace them with p.break() --> 503 _repr_pprint(obj, p, cycle) 504 return 505 p.begin_group(1, '<')

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/lib/pretty.py in _reprpprint(obj, p, cycle) 699 """A pprint that just redirects to the normal repr function.""" 700 # Find newlines and replace them with p.break() --> 701 output = repr(obj) 702 for idx,output_line in enumerate(output.splitlines()): 703 if idx:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/frame.py in repr(self) 405 stk = traceback.extract_stack() 406 if not ("IPython" in stk[-2][0] and "info" == stk[-2][2]): --> 407 self.show() 408 return "" 409

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/frame.py in show(self, use_pandas, rows, cols) 417 print("This H2OFrame has been removed.") 418 return --> 419 if not self._ex._cache.is_valid(): self._frame()._ex._cache.fill() 420 if H2ODisplay._in_ipy(): 421 import IPython.display

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/frame.py in _frame(self, rows, rows_offset, cols, cols_offset, fill_cache) 480 481 def _frame(self, rows=10, rows_offset=0, cols=-1, cols_offset=0, fill_cache=False): --> 482 self._ex._eager_frame() 483 if fill_cache: 484 self._ex._cache.fill(rows=rows, rows_offset=rows_offset, cols=cols, cols_offset=cols_offset)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _eager_frame(self) 92 if not self._cache.is_empty(): return 93 if self._cache._id is not None: return # Data already computed under ID, but not cached locally ---> 94 self._eval_driver(True) 95 96 def _eager_scalar(self): # returns a scalar (or a list of scalars)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _eval_driver(self, top) 105 106 def _eval_driver(self, top): --> 107 exec_str = self._get_ast_str(top) 108 res = ExprNode.rapids(exec_str) 109 if 'scalar' in res:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _get_ast_str(self, top) 141 return self._cache._id # Data already computed under ID, but not cached 142 # assert isinstance(self._children,tuple) --> 143 exec_str = "({} {})".format(self._op, " ".join([ExprNode._arg_to_expr(ast) for ast in self._children])) 144 gc_ref_cnt = len(gc.get_referrers(self)) 145 if top or gc_ref_cnt >= ExprNode.MAGIC_REF_COUNT:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in (.0) 141 return self._cache._id # Data already computed under ID, but not cached 142 # assert isinstance(self._children,tuple) --> 143 exec_str = "({} {})".format(self._op, " ".join([ExprNode._arg_to_expr(ast) for ast in self._children])) 144 gc_ref_cnt = len(gc.get_referrers(self)) 145 if top or gc_ref_cnt >= ExprNode.MAGIC_REF_COUNT:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _arg_to_expr(arg) 153 return "[]" # empty list 154 if isinstance(arg, ExprNode): --> 155 return arg._get_ast_str(False) 156 if isinstance(arg, ASTId): 157 return str(arg)

... last 3 frames repeated, from the frame below ...

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _get_ast_str(self, top) 141 return self._cache._id # Data already computed under ID, but not cached 142 # assert isinstance(self._children,tuple) --> 143 exec_str = "({} {})".format(self._op, " ".join([ExprNode._arg_to_expr(ast) for ast in self._children])) 144 gc_ref_cnt = len(gc.get_referrers(self)) 145 if top or gc_ref_cnt >= ExprNode.MAGIC_REF_COUNT:

RecursionError: maximum recursion depth exceeded

In [ ]:

​ {code}

exalate-issue-sync[bot] commented 1 year ago

Nidhi Mehta commented: this equivalent code works in R -

{code:java} x = h2o.importFile("/Users/nidhimehta/Downloads/dummyt.txt") tt = h2o.unique(x$rr) #499851 unique categories gg = as.h2o(tt[1:10000,1]) # pulling frist 10000 into R

ww = x[!(x$rr %in% as.character(as.vector(gg$C1))) , ]#20008 rows do not match

{code}

exalate-issue-sync[bot] commented 1 year ago

Michal Malohlava commented: @Nidhi, it is problem on our side. We need to make sure, that both clients generates similar Rapids expressions:

R generate this expression: POST /99/Rapids, parms: {ast=(tmp= RTMP_sid_b137_6 (rows dummyt1.hex_sid_b137_2 (!! (match (cols dummyt1.hex_sid_b137_2 [0]) ["000Ne" "000TW" .... "00czA" "00d2V" "00d87" "00di5" "00dy4" "00e0g" "00eDl"] 0 NULL)))), session_id=_sid_b137}

but Python generates much longer expression:

Solution: so we need to transform is.in in the same way as it is done in R (edited)

exalate-issue-sync[bot] commented 1 year ago

Raymond Peck commented: Python 3.6 fails for me the following way, which seems to be a new bug. [~accountid:557058:5bcbac08-75cf-4c6b-b4d2-294f7c0fe9b8], what do you think?

versionFromGradle='3.17.0',projectVersion='3.17.0.99999',branch='master',lastCommitHash='14ba71c5568d1d8a291167e746f37544c71cdf37',gitDescribe='jenkins-master-4151',compiledOn='2017-12-26 11:33:29',compiledBy='rpeck' [2017-12-26 20:23:51] Connect to h2o on IP: localhost PORT: 54321

Parse progress: [#########################################################] 100% Traceback (most recent call last): File "/Users/rpeck/Source/h2o-3/h2o-py/scripts/h2o-py-test-setup.py", line 146, in h2o_test_setup(sys.argv) File "/Users/rpeck/Source/h2o-3/h2o-py/scripts/h2o-py-test-setup.py", line 141, in h2o_test_setup elif _ISPYUNIT: pyunit_utils.pyunit_exec(_TESTNAME) File "/Users/rpeck/Source/h2o-3/h2o-py/tests/pyunit_utils/utilsPY.py", line 444, in pyunit_exec exec(pyunit_c, {}) File "/Users/rpeck/Source/h2o-3/h2o-py/tests/testdir_jira/pyunit_pubdev_5174_isin_efficiency.py", line 32, in pubdev_5174() File "/Users/rpeck/Source/h2o-3/h2o-py/tests/testdir_jira/pyunit_pubdev_5174_isin_efficiency.py", line 18, in pubdev_5174 x[~x['rr'].isin(gg['C1'].ascharacter().as_data_frame()['C1'].tolist())] TypeError: list indices must be integers or slices, not str

exalate-issue-sync[bot] commented 1 year ago

Raymond Peck commented: Also, in R if I look at nrows of {{x}}, {{tt}} and {{ww}} I get these values:

{quote} expect_equal(nrow(x), 1000000) expect_equal(nrow(tt), 499851) expect_equal(nrow(ww), 979992) {quote}

I guess I'm misunderstanding something, because I don't see how {{nrow(ww)}} can be greater than 1000000 - 499851.

exalate-issue-sync[bot] commented 1 year ago

Nidhi Mehta commented: [~accountid:557058:3ae3c86a-e56a-4211-99d4-9a8cf5ab63f6] 499851 are unique categories and has nothing to do with N(i.e -1e6)

exalate-issue-sync[bot] commented 1 year ago

Nidhi Mehta commented: [~accountid:557058:3ae3c86a-e56a-4211-99d4-9a8cf5ab63f6] - i have Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) and got the above error (i.e original jira msg) on (think) 3.16.0.2.

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-5174 Assignee: Raymond Peck Reporter: Nidhi Mehta State: Resolved Fix Version: 3.18.0.1 Attachments: Available (Count: 1) Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/2594 https://github.com/h2oai/h2o-3/pull/1872

Attachments From Jira

Attachment Name: dummyt.txt Attached By: Nidhi Mehta File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5174/dummyt.txt