h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

support streaming output for endpoints which generate large non-JSON results #14494

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

The point of this ticket is to support streaming large results to the HTTP client, and then add new endpoints for DownloadDataset and model export which use this feature.

The clients then need to be ported to use streaming.

Note that the old behavior should continue to work; a /bin suffix should trigger streaming binary (non-JSON) output.

exalate-issue-sync[bot] commented 1 year ago

Raymond Peck commented: In RequestServer you’ll find these lines:

   Schema s = handle(type, route, version, parms);
   PojoUtils.filterFields(s, (String)parms.get("_include_fields"), (String)parms.get("_exclude_fields"));
   Response r = wrap(s, type);
   return r;

They call this:

case html: // These request-types only dictate the response-type; case java: // the normal action is always done. case json: case xml: { Handler h = route._handler; return h.handle(version,route,parms); // Can throw any Exception the handler throws }

There might be something slightly cleaner, but my initial thought is:

Somehow (tm) you’ll need to get the socket from Jetty passed to serve(). handle() should see if the output schema for the handler method implements StreamingOutput and if so it should call h.handle() in the handler, passing in the socket as a fourth, nullable argument.

It will say:

if (null != socket) { result = (Schema)route._handler_method.invoke(this, version, schema, socket); } else { result = (Schema)route._handler_method.invoke(this, version, schema); }

Back up in serve() if you’ve passed down a null socket you keep doing what it does now, calling wrap(); if not it will instead close the socket and return.

That’s pretty close.

The piece I don’t know is how to pass through the output stream / socket. You’’ need to look at the caller of RequestServer.serve().

You or I will also need to add an is_streaming flag to the metadata.

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-1525 Assignee: Nishant Reporter: Raymond Peck State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A