h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.94k stars 2k forks source link

h2o.uploadFile fails for large files #11047

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Import of 15GB file into cluster 3node (28GB/node) {noformat}

And import them into H2O

from pyspark import SparkFiles

prices_hf = h2o.upload_file(SparkFiles.get(DATASET)) {noformat}

produces:

{noformat} HTTP 500 Server Error:

u'{"__meta":{"schema_version":3,"schema_name":"H2OErrorV3","schema_type":"H2OError"},"timestamp":1489182717414,"error_url":"/3/PostFile","msg":"\n\nERRORMESSAGE:\n\nearly EOF\n\n","dev_msg":"\n\nERRORMESSAGE:\n\nearly EOF\n\n","http_status":500,"values":{},"exception_type":"ai.h2o.org.eclipse.jetty.io.EofException","exception_msg":"\n\nERRORMESSAGE:\n\nearly EOF\n\n","stacktrace":["ai.h2o.org.eclipse.jetty.io.EofException: early EOF"," ai.h2o.org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)"," water.JettyHTTPD$InputStreamWrapper.readInternal(JettyHTTPD.java:607)"," water.JettyHTTPD$InputStreamWrapper.read(JettyHTTPD.java:580)"," water.fvec.UploadFileVec.readPut_impl(UploadFileVec.java:77)"," water.fvec.UploadFileVec.readPut(UploadFileVec.java:60)"," water.fvec.UploadFileVec.readPut(UploadFileVec.java:56)"," water.api.PostFileServlet.doPost(PostFileServlet.java:45)"," javax.servlet.http.HttpServlet.service(HttpServlet.java:707)"," javax.servlet.http.HttpServlet.service(HttpServlet.java:790)"," ai.h2o.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"," ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)"," ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)"," ai.h2o.org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)"," ai.h2o.org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)"," ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)"," ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)"," ai.h2o.org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)"," ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)"," ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)"," ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"," ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"," ai.h2o.org.eclipse.jetty.server.Server.handle(Server.java:370)"," ai.h2o.org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)"," ai.h2o.org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)"," ai.h2o.org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)"," ai.h2o.org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)"," ai.h2o.org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)"," ai.h2o.org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)"," ai.h2o.org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)"," ai.h2o.org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)"," ai.h2o.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)"," ai.h2o.org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)"," java.lang.Thread.run(Thread.java:745)"]}'

Traceback (most recent call last):

File "/mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1489177452073_0013/container_1489177452073_0013_01_000001/pySparkling-2.0.egg/h2o/h2o.py", line 335, in upload_file

return H2OFrame()._upload_parse(path, destination_frame, header, sep, col_names, col_types, na_strings)

File "/mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1489177452073_0013/container_1489177452073_0013_01_000001/pySparkling-2.0.egg/h2o/frame.py", line 314, in _upload_parse

ret = h2o.api("POST /3/PostFile", filename=path)

File "/mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1489177452073_0013/container_1489177452073_0013_01_000001/pySparkling-2.0.egg/h2o/h2o.py", line 97, in api

return h2oconn.request(endpoint, data=data, json=json, filename=filename, save_to=save_to)

File "/mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1489177452073_0013/container_1489177452073_0013_01_000001/pySparkling-2.0.egg/h2o/backend/connection.py", line 405, in request

return self._process_response(resp, save_to)

File "/mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1489177452073_0013/container_1489177452073_0013_01_000001/pySparkling-2.0.egg/h2o/backend/connection.py", line 731, in _process_response

raise H2OServerError("HTTP %d %s:\n%r" % (status_code, response.reason, data))

H2OServerError: HTTP 500 Server Error:

u'{"__meta":{"schema_version":3,"schema_name":"H2OErrorV3","schema_type":"H2OError"},"timestamp":1489182717414,"error_url":"/3/PostFile","msg":"\n\nERRORMESSAGE:\n\nearly EOF\n\n","dev_msg":"\n\nERRORMESSAGE:\n\nearly EOF\n\n","http_status":500,"values":{},"exception_type":"ai.h2o.org.eclipse.jetty.io.EofException","exception_msg":"\n\nERRORMESSAGE:\n\nearly EOF\n\n","stacktrace":["ai.h2o.org.eclipse.jetty.io.EofException: early EOF"," ai.h2o.org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)"," water.JettyHTTPD$InputStreamWrapper.readInternal(JettyHTTPD.java:607)"," water.JettyHTTPD$InputStreamWrapper.read(JettyHTTPD.java:580)"," water.fvec.UploadFileVec.readPut_impl(UploadFileVec.java:77)"," water.fvec.UploadFileVec.readPut(UploadFileVec.java:60)"," water.fvec.UploadFileVec.readPut(UploadFileVec.java:56)"," water.api.PostFileServlet.doPost(PostFileServlet.java:45)"," javax.servlet.http.HttpServlet.service(HttpServlet.java:707)"," javax.servlet.http.HttpServlet.service(HttpServlet.java:790)"," ai.h2o.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"," ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)"," ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)"," ai.h2o.org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)"," ai.h2o.org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)"," ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)"," ai.h2o.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)"," ai.h2o.org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)"," ai.h2o.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)"," ai.h2o.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)"," ai.h2o.org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"," ai.h2o.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"," ai.h2o.org.eclipse.jetty.server.Server.handle(Server.java:370)"," ai.h2o.org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)"," ai.h2o.org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)"," ai.h2o.org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)"," ai.h2o.org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)"," ai.h2o.org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)"," ai.h2o.org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)"," ai.h2o.org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)"," ai.h2o.org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)"," ai.h2o.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)"," ai.h2o.org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)"," java.lang.Thread.run(Thread.java:745)"]}' {noformat}

exalate-issue-sync[bot] commented 1 year ago

Michael Jules commented: any updates on this, I’m encountering a similar issue with large files

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-4158 Assignee: New H2O Bugs Reporter: Michal Malohlava State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A