uwescience / myria

Myria is a scalable Analytics-as-a-Service platform based on relational algebra.
myria.cs.washington.edu
Other
112 stars 46 forks source link

stopped downloads lead to zombie queries #509

Closed dhalperi closed 10 years ago

dhalperi commented 10 years ago

There seems to be some bug with handling of stopped downloads, which does and/or can lead to queries never being removed from the activeQueries list.

To reproduce:

One way to check the active queries list is to exploit a bug(?) in the system code by picking a small max query. E.g., https://demo.myria.cs.washington.edu/queries?max=1. Any query with an ID# >max is an active query.

dhalperi commented 10 years ago

Seems to be an InterruptedException in the MasterCatalog while the Server is trying to update the query status to killed. Hmm. Maybe the server is killing itself after it's already killed the query?

ERROR 2014-05-02 12:17:29,702 [Master query executor#18] QuerySubTreeTask - Unexpected exception occur at operator excution. Operator: edu.washington.escience.myria.operator.DataOutput@65e80afe
edu.washington.escience.myria.DbException: java.io.IOException: Pipe closed
        at edu.washington.escience.myria.operator.DataOutput.consumeTuples(DataOutput.java:55)
        at edu.washington.escience.myria.operator.RootOperator.fetchNextReady(RootOperator.java:59)
        at edu.washington.escience.myria.operator.Operator.nextReady(Operator.java:320)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask.executeActually(QuerySubTreeTask.java:411)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask.access$200(QuerySubTreeTask.java:33)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask$1.call(QuerySubTreeTask.java:162)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask$1.call(QuerySubTreeTask.java:153)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at edu.washington.escience.myria.util.concurrent.RenamingThreadFactory$1.run(RenamingThreadFactory.java:33)
Caused by: java.io.IOException: Pipe closed
        at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:261)
        at java.io.PipedInputStream.receive(PipedInputStream.java:227)
        at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
        at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
        at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
        at java.io.BufferedWriter.write(BufferedWriter.java:188)
        at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
        at java.io.BufferedWriter.write(BufferedWriter.java:230)
        at java.io.Writer.write(Writer.java:157)
        at org.supercsv.io.AbstractCsvWriter.writeRow(AbstractCsvWriter.java:196)
        at org.supercsv.io.CsvListWriter.write(CsvListWriter.java:87)
        at edu.washington.escience.myria.CsvTupleWriter.writeTuples(CsvTupleWriter.java:74)
        at edu.washington.escience.myria.operator.DataOutput.consumeTuples(DataOutput.java:53)
        ... 10 more
WARN  2014-05-02 12:17:29,705 [Master query executor#18] OperationFutureBase - An exception was thrown by OperationFutureListener.
edu.washington.escience.myria.coordinator.catalog.CatalogException: java.lang.InterruptedException
        at edu.washington.escience.myria.coordinator.catalog.MasterCatalog.queryFinished(MasterCatalog.java:1393)
        at edu.washington.escience.myria.parallel.Server$2.operationComplete(Server.java:1121)
        at edu.washington.escience.myria.parallel.QueryFutureListener.operationComplete(QueryFutureListener.java:43)
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListener(OperationFutureBase.java:606)                                                                        
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListeners(OperationFutureBase.java:565)                                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.wakeupWaitersAndNotifyListeners(OperationFutureBase.java:158)                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.setFailure0(OperationFutureBase.java:529)                                                                           
        at edu.washington.escience.myria.parallel.DefaultQueryFuture.setFailure(DefaultQueryFuture.java:67)                                                                                      
        at edu.washington.escience.myria.parallel.MasterQueryPartition$WorkerExecutionInfo$2.operationComplete(MasterQueryPartition.java:125)                                                    
        at edu.washington.escience.myria.parallel.QueryFutureListener.operationComplete(QueryFutureListener.java:43)                                                                             
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListener(OperationFutureBase.java:606)                                                                        
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListeners(OperationFutureBase.java:565)                                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.wakeupWaitersAndNotifyListeners(OperationFutureBase.java:158)                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.setFailure0(OperationFutureBase.java:529)                                                                           
        at edu.washington.escience.myria.parallel.DefaultQueryFuture.setFailure(DefaultQueryFuture.java:67)                                                                                      
        at edu.washington.escience.myria.parallel.MasterQueryPartition$WorkerExecutionInfo$2.operationComplete(MasterQueryPartition.java:125)                                                    
        at edu.washington.escience.myria.parallel.QueryFutureListener.operationComplete(QueryFutureListener.java:43)                                                                             
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListener(OperationFutureBase.java:606)                                                                        
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListeners(OperationFutureBase.java:565)                                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.wakeupWaitersAndNotifyListeners(OperationFutureBase.java:158)                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.setFailure0(OperationFutureBase.java:529)                                                                           
        at edu.washington.escience.myria.parallel.DefaultQueryFuture.setFailure(DefaultQueryFuture.java:67)                                                                                      
        at edu.washington.escience.myria.parallel.MasterQueryPartition$1.operationComplete(MasterQueryPartition.java:256)                                                                        
        at edu.washington.escience.myria.parallel.TaskFutureListener.operationComplete(TaskFutureListener.java:43)                                                                               
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListener(OperationFutureBase.java:606)                                                                        
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListeners(OperationFutureBase.java:565)                                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.wakeupWaitersAndNotifyListeners(OperationFutureBase.java:158)                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.setFailure0(OperationFutureBase.java:529)                                                                           
        at edu.washington.escience.myria.parallel.DefaultTaskFuture.setFailure(DefaultTaskFuture.java:67)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask.executeActually(QuerySubTreeTask.java:462)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask.access$200(QuerySubTreeTask.java:33)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask$1.call(QuerySubTreeTask.java:162)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask$1.call(QuerySubTreeTask.java:153)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at edu.washington.escience.myria.util.concurrent.RenamingThreadFactory$1.run(RenamingThreadFactory.java:33)
Caused by: java.lang.InterruptedException
        at com.almworks.sqlite4java.SQLiteJob.get(SQLiteJob.java:322)
        at com.almworks.sqlite4java.SQLiteJob.get(SQLiteJob.java:283)
        at edu.washington.escience.myria.coordinator.catalog.MasterCatalog.queryFinished(MasterCatalog.java:1367)
        ... 29 more