Closed exalate-issue-sync[bot] closed 1 year ago
Nidhi Mehta commented: this equivalent code works in R -
{code:java} x = h2o.importFile("/Users/nidhimehta/Downloads/dummyt.txt") tt = h2o.unique(x$rr) #499851 unique categories gg = as.h2o(tt[1:10000,1]) # pulling frist 10000 into R
ww = x[!(x$rr %in% as.character(as.vector(gg$C1))) , ]#20008 rows do not match
{code}
Michal Malohlava commented: @Nidhi, it is problem on our side. We need to make sure, that both clients generates similar Rapids expressions:
R generate this expression:
POST /99/Rapids, parms: {ast=(tmp= RTMP_sid_b137_6 (rows dummyt1.hex_sid_b137_2 (!! (match (cols dummyt1.hex_sid_b137_2 [0]) ["000Ne" "000TW" .... "00czA" "00d2V" "00d87" "00di5" "00dy4" "00e0g" "00eDl"] 0 NULL)))), session_id=_sid_b137}
but Python generates much longer expression:
Solution:
so we need to transform is.in
in the same way as it is done in R (edited)
Raymond Peck commented: Python 3.6 fails for me the following way, which seems to be a new bug. [~accountid:557058:5bcbac08-75cf-4c6b-b4d2-294f7c0fe9b8], what do you think?
versionFromGradle='3.17.0',projectVersion='3.17.0.99999',branch='master',lastCommitHash='14ba71c5568d1d8a291167e746f37544c71cdf37',gitDescribe='jenkins-master-4151',compiledOn='2017-12-26 11:33:29',compiledBy='rpeck' [2017-12-26 20:23:51] Connect to h2o on IP: localhost PORT: 54321
Parse progress: [#########################################################] 100%
Traceback (most recent call last):
File "/Users/rpeck/Source/h2o-3/h2o-py/scripts/h2o-py-test-setup.py", line 146, in
Raymond Peck commented: Also, in R if I look at nrows of {{x}}, {{tt}} and {{ww}} I get these values:
{quote} expect_equal(nrow(x), 1000000) expect_equal(nrow(tt), 499851) expect_equal(nrow(ww), 979992) {quote}
I guess I'm misunderstanding something, because I don't see how {{nrow(ww)}} can be greater than 1000000 - 499851.
Nidhi Mehta commented: [~accountid:557058:3ae3c86a-e56a-4211-99d4-9a8cf5ab63f6] 499851 are unique categories and has nothing to do with N(i.e -1e6)
Nidhi Mehta commented: [~accountid:557058:3ae3c86a-e56a-4211-99d4-9a8cf5ab63f6] - i have Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) and got the above error (i.e original jira msg) on (think) 3.16.0.2.
JIRA Issue Migration Info
Jira Issue: PUBDEV-5174 Assignee: Raymond Peck Reporter: Nidhi Mehta State: Resolved Fix Version: 3.18.0.1 Attachments: Available (Count: 1) Development PRs: Available
Linked PRs from JIRA
https://github.com/h2oai/h2o-3/pull/2594 https://github.com/h2oai/h2o-3/pull/1872
Attachments From Jira
Attachment Name: dummyt.txt Attached By: Nidhi Mehta File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5174/dummyt.txt
{code:java} import h2o h2o.init() x = h2o.import_file('/Users/nidhimehta/Downloads/dummyt.txt')
tt = x['rr'].unique() gg = tt[:10000, 0]
x[~x['rr'].isin(gg['C1'].ascharacter().as_data_frame()['C1'].tolist())]
{code}
{code:java}
RecursionError Traceback (most recent call last) /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/core/formatters.py in call(self, obj) 670 type_pprinters=self.type_printers, 671 deferred_pprinters=self.deferred_printers) --> 672 printer.pretty(obj) 673 printer.flush() 674 return stream.getvalue()
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj) 381 if callable(meth): 382 return meth(obj, self, cycle) --> 383 return _default_pprint(obj, self, cycle) 384 finally: 385 self.end_group()
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle) 501 if _safe_getattr(klass, 'repr', None) not in _baseclassreprs: 502 # A user-provided repr. Find newlines and replace them with p.break() --> 503 _repr_pprint(obj, p, cycle) 504 return 505 p.begin_group(1, '<')
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/lib/pretty.py in _reprpprint(obj, p, cycle) 699 """A pprint that just redirects to the normal repr function.""" 700 # Find newlines and replace them with p.break() --> 701 output = repr(obj) 702 for idx,output_line in enumerate(output.splitlines()): 703 if idx:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/frame.py in repr(self) 405 stk = traceback.extract_stack() 406 if not ("IPython" in stk[-2][0] and "info" == stk[-2][2]): --> 407 self.show() 408 return "" 409
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/frame.py in show(self, use_pandas, rows, cols) 417 print("This H2OFrame has been removed.") 418 return --> 419 if not self._ex._cache.is_valid(): self._frame()._ex._cache.fill() 420 if H2ODisplay._in_ipy(): 421 import IPython.display
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/frame.py in _frame(self, rows, rows_offset, cols, cols_offset, fill_cache) 480 481 def _frame(self, rows=10, rows_offset=0, cols=-1, cols_offset=0, fill_cache=False): --> 482 self._ex._eager_frame() 483 if fill_cache: 484 self._ex._cache.fill(rows=rows, rows_offset=rows_offset, cols=cols, cols_offset=cols_offset)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _eager_frame(self) 92 if not self._cache.is_empty(): return 93 if self._cache._id is not None: return # Data already computed under ID, but not cached locally ---> 94 self._eval_driver(True) 95 96 def _eager_scalar(self): # returns a scalar (or a list of scalars)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _eval_driver(self, top) 105 106 def _eval_driver(self, top): --> 107 exec_str = self._get_ast_str(top) 108 res = ExprNode.rapids(exec_str) 109 if 'scalar' in res:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _get_ast_str(self, top) 141 return self._cache._id # Data already computed under ID, but not cached 142 # assert isinstance(self._children,tuple) --> 143 exec_str = "({} {})".format(self._op, " ".join([ExprNode._arg_to_expr(ast) for ast in self._children])) 144 gc_ref_cnt = len(gc.get_referrers(self)) 145 if top or gc_ref_cnt >= ExprNode.MAGIC_REF_COUNT:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in(.0)
141 return self._cache._id # Data already computed under ID, but not cached
142 # assert isinstance(self._children,tuple)
--> 143 exec_str = "({} {})".format(self._op, " ".join([ExprNode._arg_to_expr(ast) for ast in self._children]))
144 gc_ref_cnt = len(gc.get_referrers(self))
145 if top or gc_ref_cnt >= ExprNode.MAGIC_REF_COUNT:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _arg_to_expr(arg) 153 return "[]" # empty list 154 if isinstance(arg, ExprNode): --> 155 return arg._get_ast_str(False) 156 if isinstance(arg, ASTId): 157 return str(arg)
... last 3 frames repeated, from the frame below ...
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h2o/expr.py in _get_ast_str(self, top) 141 return self._cache._id # Data already computed under ID, but not cached 142 # assert isinstance(self._children,tuple) --> 143 exec_str = "({} {})".format(self._op, " ".join([ExprNode._arg_to_expr(ast) for ast in self._children])) 144 gc_ref_cnt = len(gc.get_referrers(self)) 145 if top or gc_ref_cnt >= ExprNode.MAGIC_REF_COUNT:
RecursionError: maximum recursion depth exceeded
In [ ]:
{code}