H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Is your feature request related to a problem? Please describe.
When you export data (e.g., parquets) from Spark you will generally have a _SUCCESS file and .crc (checksum) files in the directory. h2o does not like these files. h2o should, like Spark does, ignore them when importing a directory. Otherwise this error is thrown:
water.exceptions.H2OIllegalArgumentException
[1] "water.exceptions.H2OIllegalArgumentException: Column separator mismatch. One file seems to use \",\" and the other uses \"|\"."
[2] " water.api.ParseSetupHandler.guessSetup(ParseSetupHandler.java:50)"
[3] " sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"
[4] " sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)"
[5] " sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"
[6] " java.lang.reflect.Method.invoke(Method.java:498)"
[7] " water.api.Handler.handle(Handler.java:60)"
[8] " water.api.RequestServer.serve(RequestServer.java:472)"
[9] " water.api.RequestServer.doGeneric(RequestServer.java:303)"
[10] " water.api.RequestServer.doPost(RequestServer.java:227)"
[11] " javax.servlet.http.HttpServlet.service(HttpServlet.java:707)"
[12] " javax.servlet.http.HttpServlet.service(HttpServlet.java:790)"
[13] " org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)"
[14] " org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:554)"
[15] " org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)"
[16] " org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)"
[17] " org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)"
[18] " org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)"
[19] " org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)"
[20] " org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)"
[21] " org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)"
[22] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)"
[23] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)"
[24] " water.webserver.jetty9.Jetty9ServerAdapter$LoginHandler.handle(Jetty9ServerAdapter.java:130)"
[25] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)"
[26] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)"
[27] " org.eclipse.jetty.server.Server.handle(Server.java:516)"
[28] " org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)"
[29] " org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)"
[30] " org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)"
[31] " org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)"
[32] " [org.eclipse.jetty.io](http://org.eclipse.jetty.io/).AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)"
[33] " [org.eclipse.jetty.io](http://org.eclipse.jetty.io/).FillInterest.fillable(FillInterest.java:105)"
[34] " [org.eclipse.jetty.io](http://org.eclipse.jetty.io/).ChannelEndPoint$1.run(ChannelEndPoint.java:104)"
[35] " org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)"
[36] " org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)"
[37] " org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)"
[38] " org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)"
[39] " org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)"
[40] " org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)"
[41] " org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)"
[42] " java.lang.Thread.run(Thread.java:748)"
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
Column separator mismatch. One file seems to use "," and the other uses "|".
In addition: Warning message:
In append(newtodo, x) :
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
Column separator mismatch. One file seems to use "," and the other uses "|".
Describe the solution you'd likeh2o.importFolder(...) runs without error even if the directory contains a _SUCCESS file or .crc files. Note that even h2o.exportFile by default writes .crc if format = "parquet".
Describe alternatives you've considered
One option is to use the pattern argument, but that's somewhat of a nuisance.
Is your feature request related to a problem? Please describe. When you export data (e.g., parquets) from Spark you will generally have a
_SUCCESS
file and.crc
(checksum) files in the directory.h2o
does not like these files.h2o
should, like Spark does, ignore them when importing a directory. Otherwise this error is thrown:Describe the solution you'd like
h2o.importFolder(...)
runs without error even if the directory contains a_SUCCESS
file or.crc
files. Note that evenh2o.exportFile
by default writes.crc
ifformat = "parquet"
.Describe alternatives you've considered One option is to use the
pattern
argument, but that's somewhat of a nuisance.