terascope / teraslice

Scalable data processing pipelines in JavaScript
https://terascope.github.io/teraslice/
Apache License 2.0
50 stars 13 forks source link

fix execution controller shutting down before worker - e2e test failues #3450

Closed sotojn closed 7 months ago

sotojn commented 1 year ago

Right now there is an issue with a test suite in e2e tests. When running the following test, my terminal outputs this:

command: SKIP_DOCKER_BUILD_IN_E2E='true' TEST_ELASTICSEARCH='true' ELASTICSEARCH_VERSION='7.9.3' TEST_KAFKA='true' ts-scripts test --keep-open --suite e2e -- --silent=false --testPathPattern reindex-spec

 FAIL   e2e  test/cases/data/reindex-spec.js (36.666 s)
  reindex
    ✓ should work for simple case (4912 ms)
    ✓ should work when no data is returned with lucene query (2632 ms)
    ✓ should collect cluster level stats (10 ms)
    ✓ should support idempotency (23657 ms)
    ✕ should be able to recover and continue (3625 ms)

  ● reindex › should be able to recover and continue

    TSError: Execution cannot reach the target status, "stopped", because it is in the terminal state, "failed"

      178 |             // status won't change further.
      179 |             if (terminal[result]) {
    > 180 |                 throw new TSError(
          |                       ^
      181 |                     `Execution cannot reach the target status, "${target}", because it is in the terminal state, "${result}"`,
      182 |                     { context: { lastStatus: result } }
      183 |                 );

      at checkStatus (../packages/teraslice-client-js/src/ex.ts:180:23)
      at TerasliceHarness.waitForExStatus (test/teraslice-harness.js:60:28)
      at TerasliceHarness.testJobLifeCycle (test/teraslice-harness.js:234:13)
      at Object.<anonymous> (test/cases/data/reindex-spec.js:91:9)

Test Suites: 1 failed, 1 total
Tests:       1 failed, 4 passed, 5 total
Snapshots:   0 total
Time:        36.785 s
Ran all test suites matching /reindex-spec/i.
◼  test suite "e2e" Timer run for: 58.12s
ℹ  info      keeping the tests open so the services don't shutdown, use ctrl-c to exit.
⬤  debug     Running test/global.teardown.js
✖  error     TSError: Failed to teardown test for "e2e", caused by Error: Could not create a client with config {"node":"http://192.168.12.65:49200"} 
    at globalTeardown (/Users/jsoto/Workspace/teraslice/packages/scripts/dist/src/helpers/test-runner/utils.js:269:41)
    at async /Users/jsoto/Workspace/teraslice/packages/scripts/dist/src/helpers/test-runner/index.js:247:13
    at async TestTracker.runCleanupByKey (/Users/jsoto/Workspace/teraslice/packages/scripts/dist/src/helpers/test-runner/tracker.js:52:13)
    at async TestTracker.end (/Users/jsoto/Workspace/teraslice/packages/scripts/dist/src/helpers/test-runner/tracker.js:84:17)

✖  fatal     Command failed with exit code 1: jest --forceExit --coverage false --color --silent --runInBand --bail --silent=false --testPathPattern reindex-spec

error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

This fails because inside teraslice/e2e/test/cases/data/reindex-spec.js it runs testJobLifeCycle on line 91 which is a method of the teraslice harness.

83:     it('should be able to recover and continue', async () => {
84:         const jobSpec = terasliceHarness.newJob('reindex');
85:         jobSpec.name = 'reindex (with recovery)';
86:         const index = terasliceHarness.newSpecIndex('reindex');
87:         // Job needs to be able to run long enough to cycle
88:         jobSpec.operations[0].index = terasliceHarness.getExampleIndex(1000);
89:         jobSpec.operations[1].index = index;
90: 
-> 91:         await terasliceHarness.testJobLifeCycle(jobSpec);
92: 
93:         const stats = await terasliceHarness.indexStats(index);
94:         expect(stats.count).toBe(1000);
95:     });
96: });

This method fails because on line 225 it waits for p to return correct but it never does because execution controller is in a failed state.

File: /teraslice/e2e/test/teraslice-harness.js

203:     async testJobLifeCycle(jobSpec, delay = 3000) {
204:         let ex;
205:         const waitForStatus = async (status) => this.waitForExStatus(ex, status, 50, 0);
206:         if (delay) {
207:             this.injectDelay(jobSpec, delay);
208:         }
209:         ex = await this.teraslice.executions.submit(jobSpec);
210: 
211:         await waitForStatus('running');
212:  
213:         let p = waitForStatus('paused');
214:         ex.pause();
215:         await p;
216: 
217:         p = waitForStatus('running');
218:         ex.resume();
219:         await p;
220: 
221:         p = waitForStatus('stopped');
222:         ex.stop();
223: 
-> 224:         try {
-> 225:             await p;
226:         } catch (err) {
227:             const errStr = toString(err);
228:             if (errStr.includes('"stopped"') && errStr.includes('"completed"')) {
229:                 signale.warn(
230:                     `${errStr} - however since this can be race condition, we don't want to fail the test`
231:                 );
232:                 return ex;
233:             }
234: 
235:             throw err;
236:         }
237: 
238:         ex = await ex.recover();
239:         await waitForStatus('completed');
240:         return ex;
241:     }

This is a race condition that happens because we stop the execution before the worker has told the execution controller that it had completed its slice. So when the execution controller shuts down , it sees that the slice is stuck in a "started" state at the time of shutdown. This then marks the execution as "failed". which can no longer be set to a "stopped" state.

Here are the teraslice logs that reference the execution controller from start to finish:

[2023-11-01T17:56:59.654Z] DEBUG: teraslice/84 on d176357dbc71: assets already loaded... (assignment=execution_controller, module=asset_loader, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:56:59.667Z] DEBUG: teraslice/84 on 6655209e1c3e: assets already loaded... (assignment=worker, module=asset_loader, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.077Z]  INFO: teraslice/84 on d176357dbc71: execution storage initialized (assignment=execution_controller, module=ex_storage, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.077Z] DEBUG: teraslice/84 on d176357dbc71: attempting to connect to ClusterMaster at http://192.168.12.65:45678 (assignment=execution_controller, module=messaging:client, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.110Z] DEBUG: teraslice/84 on d176357dbc71: client 76facbef-2259-41f0-baba-76b0cc8c081f connect (assignment=execution_controller, module=messaging:client, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.114Z]  INFO: teraslice/84 on 6655209e1c3e: state storage initialized (assignment=worker, module=state_storage, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.119Z]  INFO: teraslice/84 on 6655209e1c3e: analytics storage initialized (assignment=worker, module=analytics_storage, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.120Z]  INFO: teraslice/84 on d176357dbc71: state storage initialized (assignment=execution_controller, module=state_storage, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.123Z]  INFO: teraslice/84 on d176357dbc71: [START] "elasticsearch_reader_api:elasticsearch_reader-0" api instance initialize (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.123Z]  INFO: teraslice/84 on d176357dbc71: [START] "elasticsearch_sender_api" api instance initialize (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.123Z]  INFO: teraslice/84 on d176357dbc71: [FINISH] "elasticsearch_reader_api:elasticsearch_reader-0" api instance initialize, took 0ms (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.123Z]  INFO: teraslice/84 on d176357dbc71: [FINISH] "elasticsearch_sender_api" api instance initialize, took 0ms (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.124Z]  INFO: teraslice/84 on d176357dbc71: [START] "elasticsearch_reader_api:elasticsearch_reader-0" operation initialize (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.124Z]  INFO: teraslice/84 on d176357dbc71: [START] "elasticsearch_sender_api" operation initialize (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.124Z]  INFO: teraslice/84 on d176357dbc71: [START] "elasticsearch_reader" operation initialize (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.124Z]  INFO: teraslice/84 on d176357dbc71: [FINISH] "elasticsearch_reader_api:elasticsearch_reader-0" operation initialize, took 0ms (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.124Z]  INFO: teraslice/84 on d176357dbc71: [FINISH] "elasticsearch_sender_api" operation initialize, took 0ms (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.119Z] DEBUG: teraslice/84 on 6655209e1c3e: attempting to connect to ExecutionController at http://172.25.0.4:46141 (assignment=worker, module=messaging:client, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.127Z]  WARN: teraslice/84 on d176357dbc71: max_result_window for index: ts_test_example-1000 is set at 10000. On very large indices it is possible that a slice can not be divided to stay below this limit. If that occurs an error will be thrown by Elasticsearch and the slice can not be processed. Increasing max_result_window in the Elasticsearch index settings will resolve the problem. (assignment=execution_controller, module=operation-api, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e, apiName=elasticsearch_reader_api:elasticsearch_reader-0)
[2023-11-01T17:57:00.141Z]  WARN: teraslice/84 on d176357dbc71: (assignment=execution_controller, module=execution_context_api, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
    updating execution metadata {
      elasticsearch_reader: {
        '0': {
          interval: [Array],
          count: 1000,
          start: '2023-11-01T17:56:23.000Z',
          end: '2023-11-01T17:56:24.000Z'
        }
      }
    }
[2023-11-01T17:57:00.145Z] DEBUG: teraslice/84 on 6655209e1c3e: client 172.25.0.3__kA_Arwt_ connect (assignment=worker, module=messaging:client, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_reader_api:elasticsearch_reader-0" api instance initialize (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_sender_api" api instance initialize (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_reader_api:elasticsearch_reader-0" api instance initialize, took 0ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_sender_api" api instance initialize, took 0ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "job-observer" operation initialize (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_reader_api:elasticsearch_reader-0" operation initialize (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_sender_api" operation initialize (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_reader" operation initialize (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "delay" operation initialize (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_bulk" operation initialize (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "job-observer" operation initialize, took 0ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_reader_api:elasticsearch_reader-0" operation initialize, took 0ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.145Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_sender_api" operation initialize, took 0ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.146Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "delay" operation initialize, took 1ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.146Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_reader" operation initialize, took 1ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.146Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_bulk" operation initialize, took 1ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.146Z]  INFO: teraslice/84 on 6655209e1c3e: execution: 76facbef-2259-41f0-baba-76b0cc8c081f initialized worker (assignment=worker, module=worker, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.153Z] DEBUG: teraslice/84 on d176357dbc71: (assignment=execution_controller, module=operation-api, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e, apiName=elasticsearch_reader_api:elasticsearch_reader-0)
    all date configurations for date slicer {
      size: 100,
      interval: [ 1, 's' ],
      start: Moment<2023-11-01T17:56:23Z>,
      end: Moment<2023-11-01T17:56:24Z>,
      holes: [],
      limit: Moment<2023-11-01T17:56:24Z>
    }
[2023-11-01T17:57:00.153Z]  INFO: teraslice/84 on d176357dbc71: [FINISH] "elasticsearch_reader" operation initialize, took 29ms (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.153Z]  INFO: teraslice/84 on d176357dbc71: execution: 76facbef-2259-41f0-baba-76b0cc8c081f initialized execution_controller (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.155Z]  INFO: teraslice/84 on d176357dbc71: starting execution 76facbef-2259-41f0-baba-76b0cc8c081f... (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.156Z] DEBUG: teraslice/84 on d176357dbc71: dispatching slices... (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.170Z]  INFO: teraslice/84 on d176357dbc71: slicer 0 has completed its range (assignment=execution_controller, module=slicer, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e, opName=elasticsearch_reader)
[2023-11-01T17:57:00.170Z]  INFO: teraslice/84 on d176357dbc71: all slicers for execution: 76facbef-2259-41f0-baba-76b0cc8c081f have been completed (assignment=execution_controller, module=execution_scheduler, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.170Z] DEBUG: teraslice/84 on d176357dbc71: execution 76facbef-2259-41f0-baba-76b0cc8c081f is finished scheduling, 1 remaining slices in the queue (assignment=execution_controller, module=execution_scheduler, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.221Z]  INFO: teraslice/84 on d176357dbc71: execution 76facbef-2259-41f0-baba-76b0cc8c081f is pausing... (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.392Z]  INFO: teraslice/84 on d176357dbc71: execution 76facbef-2259-41f0-baba-76b0cc8c081f is resuming... (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.398Z] DEBUG: teraslice/84 on d176357dbc71: dispatched slice 87041616-9bd8-4c48-956f-d1976e7203c2 to worker 172.25.0.3__kA_Arwt_ (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.402Z] DEBUG: teraslice/84 on d176357dbc71: waiting for 1 slices to finish (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.516Z]  INFO: teraslice/84 on 6655209e1c3e: worker received process:SIGTERM, exiting in 30s... (assignment=worker, module=worker:shutdown_handler, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.517Z]  INFO: teraslice/84 on d176357dbc71: execution_controller received process:SIGTERM, exiting in 30s... (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.617Z] DEBUG: teraslice/84 on d176357dbc71: execution shutdown was called for ex 76facbef-2259-41f0-baba-76b0cc8c081f (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.617Z]  WARN: teraslice/84 on 6655209e1c3e: worker shutdown was called for execution 76facbef-2259-41f0-baba-76b0cc8c081f due to event: SIGTERM (assignment=worker, module=worker, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.617Z] DEBUG: teraslice/84 on d176357dbc71: stopping scheduler... (assignment=execution_controller, module=execution_scheduler, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.617Z] DEBUG: teraslice/84 on d176357dbc71: shutdown is waiting for execution to finish... (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.618Z]  INFO: teraslice/84 on 6655209e1c3e: waiting until slice 87041616-9bd8-4c48-956f-d1976e7203c2 is finished before flushing { position: 1 } (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.703Z] DEBUG: teraslice/84 on d176357dbc71: execution 76facbef-2259-41f0-baba-76b0cc8c081f is done processing slices (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.703Z]  INFO: teraslice/84 on d176357dbc71: [START] "elasticsearch_reader_api:elasticsearch_reader-0" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.703Z]  INFO: teraslice/84 on d176357dbc71: [START] "elasticsearch_sender_api" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.703Z]  INFO: teraslice/84 on d176357dbc71: [START] "elasticsearch_reader" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.704Z]  INFO: teraslice/84 on d176357dbc71: [FINISH] "elasticsearch_reader_api:elasticsearch_reader-0" operation shutdown, took 1ms (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.704Z]  INFO: teraslice/84 on d176357dbc71: [FINISH] "elasticsearch_sender_api" operation shutdown, took 1ms (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.704Z]  INFO: teraslice/84 on d176357dbc71: [FINISH] "elasticsearch_reader" operation shutdown, took 1ms (assignment=execution_controller, module=slicer_context, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.704Z]  INFO: teraslice/84 on d176357dbc71: calculating statistics (assignment=execution_controller, module=slice_analytics, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.704Z]  INFO: teraslice/84 on d176357dbc71: (assignment=execution_controller, module=slice_analytics, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)

    operation elasticsearch_reader
    average completion time of: 0 ms, min: 0 ms, and max: 0 ms
    average size: 0, min: 0, and max: 0
    average memory: 0, min: 0, and max: 0

[2023-11-01T17:57:00.704Z]  INFO: teraslice/84 on d176357dbc71: (assignment=execution_controller, module=slice_analytics, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)

    operation delay
    average completion time of: 0 ms, min: 0 ms, and max: 0 ms
    average size: 0, min: 0, and max: 0
    average memory: 0, min: 0, and max: 0

[2023-11-01T17:57:00.704Z]  INFO: teraslice/84 on d176357dbc71: (assignment=execution_controller, module=slice_analytics, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)

    operation elasticsearch_bulk
    average completion time of: 0 ms, min: 0 ms, and max: 0 ms
    average size: 0, min: 0, and max: 0
    average memory: 0, min: 0, and max: 0

[2023-11-01T17:57:00.704Z]  INFO: teraslice/84 on d176357dbc71: execution 76facbef-2259-41f0-baba-76b0cc8c081f has finished in 1 seconds (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.718Z] ERROR: teraslice/84 on d176357dbc71: execution: 76facbef-2259-41f0-baba-76b0cc8c081f had 1 slice stuck in started during processing (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.738Z] DEBUG: teraslice/84 on d176357dbc71: execution 76facbef-2259-41f0-baba-76b0cc8c081f is done (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.739Z]  INFO: teraslice/84 on 6655209e1c3e: worker received process:SIGTERM, already shutting down, remaining 30s (assignment=worker, module=worker:shutdown_handler, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.739Z]  INFO: teraslice/84 on d176357dbc71: execution_controller received process:SIGTERM, already shutting down, remaining 30s (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.818Z]  INFO: teraslice/84 on d176357dbc71: shutting down. (assignment=execution_controller, module=ex_storage, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.819Z] DEBUG: teraslice/84 on 6655209e1c3e: server ExecutionController shutdown (assignment=worker, module=messaging:client, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.819Z]  INFO: teraslice/84 on d176357dbc71: shutting down (assignment=execution_controller, module=state_storage, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.819Z]  WARN: teraslice/84 on 6655209e1c3e: Execution Controller shutdown, exiting... (assignment=worker, module=worker, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.820Z]  INFO: teraslice/84 on 6655209e1c3e: client 172.25.0.3__kA_Arwt_ disconnected { reason: 'io client disconnect' } (assignment=worker, module=messaging:client, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.820Z]  INFO: teraslice/84 on d176357dbc71: client 172.25.0.3__kA_Arwt_ disconnected { reason: 'client namespace disconnect' } (assignment=execution_controller, module=messaging:server, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:00.822Z]  INFO: teraslice/84 on d176357dbc71: client 76facbef-2259-41f0-baba-76b0cc8c081f disconnected { reason: 'io client disconnect' } (assignment=execution_controller, module=messaging:client, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:01.403Z] DEBUG: teraslice/84 on d176357dbc71: waiting for 1 slices to finish (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:01.622Z] DEBUG: teraslice/84 on d176357dbc71: shutdown is waiting for execution to finish... (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:03.573Z]  INFO: teraslice/84 on 6655209e1c3e: analytics for slice: slice_id: "87041616-9bd8-4c48-956f-d1976e7203c2", slicer_id: 0, slicer_order: 1, _created: "2023-11-01T17:57:00.169Z", time: [30, 3002, 134], memory: [5370352, 418808, 6268440], size: [1000, 1000, 1000] (assignment=worker, module=slice, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e, slice_id=87041616-9bd8-4c48-956f-d1976e7203c2)
[2023-11-01T17:57:03.574Z]  INFO: teraslice/84 on 6655209e1c3e: slice 87041616-9bd8-4c48-956f-d1976e7203c2 completed (assignment=worker, module=worker, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:03.574Z]  INFO: teraslice/84 on 6655209e1c3e: flushing slice 87041616-9bd8-4c48-956f-d1976e7203c2 (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:05.827Z]  WARN: teraslice/84 on d176357dbc71: execution controller 76facbef-2259-41f0-baba-76b0cc8c081f is shutdown (assignment=execution_controller, module=execution_controller, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.591Z]  INFO: teraslice/84 on 6655209e1c3e: analytics for slice: slice_id: "87041616-9bd8-4c48-956f-d1976e7203c2", slicer_id: 0, slicer_order: 1, _created: "2023-11-01T17:57:00.169Z", time: [0, 3001, 0], memory: [752, 8504, 1824], size: [0, 0, 0] (assignment=worker, module=slice, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e, slice_id=87041616-9bd8-4c48-956f-d1976e7203c2)
[2023-11-01T17:57:06.592Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "job-observer" operation shutdown (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.592Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_reader_api:elasticsearch_reader-0" operation shutdown (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.592Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_sender_api" operation shutdown (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.592Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_reader" operation shutdown (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.592Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "delay" operation shutdown (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.592Z]  INFO: teraslice/84 on 6655209e1c3e: [START] "elasticsearch_bulk" operation shutdown (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.592Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "job-observer" operation shutdown, took 0ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.593Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_reader_api:elasticsearch_reader-0" operation shutdown, took 1ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.593Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_sender_api" operation shutdown, took 1ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.593Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_reader" operation shutdown, took 1ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.593Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "delay" operation shutdown, took 1ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.593Z]  INFO: teraslice/84 on 6655209e1c3e: [FINISH] "elasticsearch_bulk" operation shutdown, took 1ms (assignment=worker, module=worker_context, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.593Z]  INFO: teraslice/84 on 6655209e1c3e: shutting down (assignment=worker, module=state_storage, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.594Z]  INFO: teraslice/84 on 6655209e1c3e: shutting down. (assignment=worker, module=analytics_storage, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.604Z] DEBUG: teraslice/84 on 6655209e1c3e: flushed 6, on shutdown records to index ts_test_teracluster-ok__analytics* (assignment=worker, module=elasticsearch_backend, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e, storageName=analytics)
[2023-11-01T17:57:06.604Z]  WARN: teraslice/84 on 6655209e1c3e: worker 172.25.0.3__kA_Arwt_ is shutdown for execution 76facbef-2259-41f0-baba-76b0cc8c081f, processed 1 slices (assignment=worker, module=worker, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:06.831Z] DEBUG: teraslice/84 on d176357dbc71: flushed logs successfully, will exit with code 0 (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=oAXnwKtC, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)
[2023-11-01T17:57:07.606Z] DEBUG: teraslice/84 on 6655209e1c3e: flushed logs successfully, will exit with code 0 (assignment=worker, module=worker:shutdown_handler, worker_id=kA_Arwt_, ex_id=76facbef-2259-41f0-baba-76b0cc8c081f, job_id=d7130b8e-b4ce-4cfe-a351-07da76544f8e)

This test fails about 1 out of every 8 times. When triggering an execution to stop shouldn't it wait for a worker to stop?

busma13 commented 1 year ago

This error is happening outside of the k8s-e2e branch. See here: https://github.com/terascope/teraslice/actions/runs/6708159047/job/18228309248#step:6:427

godber commented 7 months ago

The PR above resolved this.