hazelcast / hazelcast-jet

Distributed Stream and Batch Processing
https://jet-start.sh
Other
1.1k stars 205 forks source link

JobNotFoundException thrown for existing job #2258

Open neilstevenson opened 4 years ago

neilstevenson commented 4 years ago

On Jet 4.1, this was in the client logs, for a job started the same day. I will post more information if I can get it to recur.

Caused by: com.hazelcast.jet.core.JobNotFoundException: Job with id 045b-faed-47c2-0004 not found
    at com.hazelcast.jet.impl.JetService.getJobConfig(JetService.java:235) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.jet.impl.operation.GetJobConfigOperation.run(GetJobConfigOperation.java:38) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:228) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:217) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.run(OperationExecutorImpl.java:406) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.runOrExecute(OperationExecutorImpl.java:433) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:590) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:575) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:534) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:236) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.spi.impl.operationservice.impl.InvocationBuilderImpl.invoke(InvocationBuilderImpl.java:59) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.client.impl.protocol.task.AbstractInvocationMessageTask.processInternal(AbstractInvocationMessageTask.java:38) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.client.impl.protocol.task.AbstractAsyncMessageTask.processMessage(AbstractAsyncMessageTask.java:71) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.initializeAndProcessMessage(AbstractMessageTask.java:152) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.run(AbstractMessageTask.java:115) ~[hazelcast-jet-4.1.jar!/:4.1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:834) ~[?:?]
    at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) ~[hazelcast-jet-4.1.jar!/:4.1]
    at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) ~[hazelcast-jet-4.1.jar!/:4.1]
neilstevenson commented 4 years ago

I believe the code that triggered it may be this, not 100% sure,

Map<Long, JobStatus>  currentState = this.jetInstance.getJobs()
    .stream()
    .collect(Collectors.toMap(Job::getId, Job::getStatus));

for (Entry<Long, JobStatus> entry : currentState.entrySet()) {
    Job job = this.jetInstance.getJob(entry.getKey());
    // other stuff
}
neilstevenson commented 4 years ago

Servers were 4.1 also, and it may be the cluster size was increasing while the client was running this operation.