uwescience / myria

Myria is a scalable Analytics-as-a-Service platform based on relational algebra.
myria.cs.washington.edu
Other
112 stars 46 forks source link

Boolean Column Types Don't Work When Using MyriaConnection.upload_file #908

Open orzikhd opened 7 years ago

orzikhd commented 7 years ago

Using the following python cell after creating a MyriaConnection:

from raco.types import LONG_TYPE, BOOLEAN_TYPE
name = {'userName': 'public', 'programName': 'adhoc', 'relationName': 'boolData'}
schema = {"columnNames" : ["num1", "num2", "flag", "num3"],
          "columnTypes" : [LONG_TYPE, LONG_TYPE, BOOLEAN_TYPE, LONG_TYPE]}
data = """1,1,True,1
2,2,False,2"""

res = connection.upload_file(name, schema, data, delimiter=',', overwrite=True)

This leads to the following stacktrace:

MyriaErrorTraceback (most recent call last)
<ipython-input-44-e3c956dcf632> in <module>()
      6 2,2,False,2"""
      7 
----> 8 res = connection.upload_file(name, schema, data, delimiter=',', overwrite=True)
/usr/local/lib/python2.7/dist-packages/myria_python-1.3.2-py2.7.egg/myria/connection.pyc in upload_file(self, relation_key, schema, data, overwrite, delimiter, binary, is_little_endian)
    515         if r.status_code not in (200, 201):
    516             raise MyriaError('Error %d: %s'
--> 517                              % (r.status_code, r.text))
    518         return r.json()
MyriaError: Error 500: edu.washington.escience.myria.DbException: Error executing query
    at edu.washington.escience.myria.parallel.Server.ingestDataset(Server.java:878)
    at edu.washington.escience.myria.api.DatasetResource.doIngest(DatasetResource.java:577)
    at edu.washington.escience.myria.api.DatasetResource.newDatasetMultipart(DatasetResource.java:516)
    at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at myriadeps.org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at myriadeps.org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
    at myriadeps.org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:172)
    at myriadeps.org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:152)
    at myriadeps.org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
    at myriadeps.org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:384)
    at myriadeps.org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:342)
    at myriadeps.org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:101)
    at myriadeps.org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:271)
    at myriadeps.org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at myriadeps.org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at myriadeps.org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at myriadeps.org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at myriadeps.org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at myriadeps.org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:297)
    at myriadeps.org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
    at myriadeps.org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1030)
    at myriadeps.org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:378)
    at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:219)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:565)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:545)
    at java.lang.Thread.run(Thread.java:748)
Caused by: edu.washington.escience.myria.DbException: Query #328.0 failed: Cannot finish a batch with partially-completed tuples
    at edu.washington.escience.myria.parallel.MasterSubQuery$WorkerExecutionInfo$2.operationComplete(MasterSubQuery.java:119)
    at edu.washington.escience.myria.parallel.LocalSubQueryFutureListener.operationComplete(LocalSubQueryFutureListener.java:45)
    at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListener(OperationFutureBase.java:582)
    at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListeners(OperationFutureBase.java:541)
    at edu.washington.escience.myria.util.concurrent.OperationFutureBase.wakeupWaitersAndNotifyListeners(OperationFutureBase.java:151)
    at edu.washington.escience.myria.util.concurrent.OperationFutureBase.setFailure0(OperationFutureBase.java:505)
    at edu.washington.escience.myria.parallel.LocalSubQueryFuture.setFailure(LocalSubQueryFuture.java:69)
    at edu.washington.escience.myria.parallel.MasterSubQuery.workerFail(MasterSubQuery.java:349)
    at edu.washington.escience.myria.parallel.QueryManager.workerFailed(QueryManager.java:318)
    at edu.washington.escience.myria.parallel.Server$MessageProcessor.run(Server.java:221)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at edu.washington.escience.myria.util.concurrent.RenamingThreadFactory$1.run(RenamingThreadFactory.java:33)
    Suppressed: edu.washington.escience.myria.DbException: Worker #0 failed: Cannot finish a batch with partially-completed tuples
        at myriadeps.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
        at edu.washington.escience.myria.storage.TupleBatchBuffer.finishBatch(TupleBatchBuffer.java:135)
        at edu.washington.escience.myria.storage.TupleBatchBuffer.popAny(TupleBatchBuffer.java:256)
        at edu.washington.escience.myria.CsvTupleReader.readTuples(CsvTupleReader.java:182)
        at edu.washington.escience.myria.operator.TupleSource.fetchNextReady(TupleSource.java:47)
        at edu.washington.escience.myria.operator.Operator.nextReady(Operator.java:362)
        at edu.washington.escience.myria.operator.RootOperator.fetchNextReady(RootOperator.java:118)
        at edu.washington.escience.myria.operator.Operator.nextReady(Operator.java:362)
        at edu.washington.escience.myria.parallel.LocalFragment.executeActually(LocalFragment.java:483)
        at edu.washington.escience.myria.parallel.LocalFragment.access$400(LocalFragment.java:48)
        at edu.washington.escience.myria.parallel.LocalFragment$1.call(LocalFragment.java:206)
        at edu.washington.escience.myria.parallel.LocalFragment$1.call(LocalFragment.java:186)
        ... 4 more
    Caused by: java.lang.IllegalStateException: Cannot finish a batch with partially-completed tuples
        ... 16 more

Although it works fine with just the LONG_TYPE columns. The same thing happens whether I try true or True. As a separate but probably related issue, if I populate a boolean column with numbers like so:

schema = {"columnNames" : ["flag"],
          "columnTypes" : [BOOLEAN_TYPE]}
data = """0
1
2"""

The result becomes:

flag
--
False
False
False
senderista commented 7 years ago

Since this is apparently an issue in MyriaX, it could be assigned to @senderista or @jingjingwang.