Closed sotojn closed 2 months ago
Ran a job that blows up with a JSON parsing error after one minute. This is in kubernetesV2
using the kafka reader. Here are the results.
Running curl localhost:5678
{
"arch": "arm64",
"clustering_type": "kubernetesV2",
"name": "ts-dev1",
"node_version": "v18.20.4",
"platform": "linux",
"teraslice_version": "v2.3.0"
}
Execution logs when it dies the first time (Exits with exit code 1):
[2024-09-17T16:18:06.118Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: dispatched slice f76f0174-0431-4036-bd85-86e20a6f4361 to worker 10.244.0.10__ZZM7zWA7 (assignment=execution_controller, module=execution_controller, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:27.921Z] ERROR: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution_controller received an uncaughtException, exiting in 30s... (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
Error: Unable to parse JSON: Unexpected token n in JSON at position 2
at Timeout._onTimeout (file:///app/assets/8a13b6aabe2c651a21008d37f487057a754c5648/index.js:22948:15)
at listOnTimeout (node:internal/timers:569:17)
at process.processTimers (node:internal/timers:512:7)
[2024-09-17T16:18:28.022Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution shutdown was called for ex 4650a336-107b-45cc-b74f-00025f43b486 (assignment=execution_controller, module=execution_controller, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.023Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: stopping scheduler... (assignment=execution_controller, module=execution_scheduler, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.023Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution 4650a336-107b-45cc-b74f-00025f43b486 is finished scheduling, 3 remaining slices in the queue (assignment=execution_controller, module=execution_scheduler, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.024Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: shutdown is waiting for execution to finish... (assignment=execution_controller, module=execution_controller, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.028Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution 4650a336-107b-45cc-b74f-00025f43b486 did not finish (assignment=execution_controller, module=execution_controller, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.028Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: [START] "elasticsearch_sender_api" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.028Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: [START] "kafka_reader_api:kafka_reader-0" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.028Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: [START] "kafka_reader" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.028Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: [FINISH] "elasticsearch_sender_api" operation shutdown, took 0ms (assignment=execution_controller, module=slicer_context, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.028Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: [FINISH] "kafka_reader" operation shutdown, took 0ms (assignment=execution_controller, module=slicer_context, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.029Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: [FINISH] "kafka_reader_api:kafka_reader-0" operation shutdown, took 0ms (assignment=execution_controller, module=slicer_context, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.029Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: calculating statistics (assignment=execution_controller, module=slice_analytics, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.029Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: (assignment=execution_controller, module=slice_analytics, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
operation kafka_reader
average completion time of: 30055 ms, min: 30055 ms, and max: 30055 ms
average size: 0, min: 0, and max: 0
average memory: -3266240, min: -3266240, and max: -3266240
[2024-09-17T16:18:28.029Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: (assignment=execution_controller, module=slice_analytics, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
operation elasticsearch_bulk
average completion time of: 0 ms, min: 0 ms, and max: 0 ms
average size: 0, min: 0, and max: 0
average memory: 1688, min: 1688, and max: 1688
[2024-09-17T16:18:28.029Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution 4650a336-107b-45cc-b74f-00025f43b486 has finished in 60 seconds (assignment=execution_controller, module=execution_controller, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.051Z] ERROR: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution 4650a336-107b-45cc-b74f-00025f43b486 received shutdown before the slicer could complete, setting status to "terminated" (assignment=execution_controller, module=execution_controller, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.126Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: shutting down (assignment=execution_controller, module=state_storage, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.126Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: shutting down. (assignment=execution_controller, module=ex_storage, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.127Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: client 10.244.0.10__ZZM7zWA7 disconnected { reason: 'client namespace disconnect' } (assignment=execution_controller, module=messaging:server, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.129Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: client 4650a336-107b-45cc-b74f-00025f43b486 disconnected { reason: 'io client disconnect' } (assignment=execution_controller, module=messaging:client, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:28.171Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution 4650a336-107b-45cc-b74f-00025f43b486 is done (assignment=execution_controller, module=execution_controller, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:33.131Z] WARN: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution controller 4650a336-107b-45cc-b74f-00025f43b486 is shutdown (assignment=execution_controller, module=execution_controller, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:33.132Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution_controller shutdown, already shutting down, remaining 25s (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:34.135Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: flushed logs successfully, will exit with code 1 (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:34.135Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-lrpwp: execution_controller shutdown took 6s, exit with 1 status code (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=wLdBBtyp, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
It will come up a second time but in a terminal
status so it will exit and tell cluster master to cleanup resources regardless if it's restartable or not. Here are the logs for the ex coming back up:
[2024-09-17T16:18:52.655Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-54dlg: execution storage initialized (assignment=execution_controller, module=ex_storage, worker_id=_XN3y8UG, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:52.757Z] ERROR: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-54dlg: Unable to verify execution on initialization (assignment=execution_controller, module=execution_controller, worker_id=_XN3y8UG, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
Error: Execution 4650a336-107b-45cc-b74f-00025f43b486 was starting in terminal status, sending execution:finished event to cluster master
at ExecutionController._verifyExecution (file:///app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:776:21)
at async ExecutionController.initialize (file:///app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:150:24)
at async Service.initialize (file:///app/source/packages/teraslice/worker-service.js:40:9)
at async main (file:///app/source/packages/teraslice/worker-service.js:80:9)
[2024-09-17T16:18:52.757Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-54dlg: shutting down. (assignment=execution_controller, module=ex_storage, worker_id=_XN3y8UG, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:52.757Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-54dlg: shutting down (assignment=execution_controller, module=state_storage, worker_id=_XN3y8UG, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:52.758Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-54dlg: client 4650a336-107b-45cc-b74f-00025f43b486 disconnected { reason: 'io client disconnect' } (assignment=execution_controller, module=messaging:client, worker_id=_XN3y8UG, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:52.789Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-54dlg: execution_controller received process:SIGTERM, already shutting down, remaining 30s (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=_XN3y8UG, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:53.859Z] DEBUG: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-54dlg: flushed logs successfully, will exit with code 0 (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=_XN3y8UG, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
[2024-09-17T16:18:53.859Z] INFO: teraslice/10 on ts-exc-kafka-to-es-json-error-5f3a4bd7-1a9a-54dlg: execution_controller shutdown took 1s, exit with 0 status code (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=_XN3y8UG, ex_id=4650a336-107b-45cc-b74f-00025f43b486, job_id=5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79)
Running kubectl -n ts-dev1 get all
after second ex pod exits:
NAME READY STATUS RESTARTS AGE
pod/teraslice-master-54c7bfddfd-ntmv2 1/1 Running 0 12m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/teraslice-master NodePort 10.96.166.169 <none> 5678:30678/TCP 12m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/teraslice-master 1/1 1 1 12m
NAME DESIRED CURRENT READY AGE
replicaset.apps/teraslice-master-54c7bfddfd 1 1 1 12m
Running curl localhost:5678/txt/ex
after job exit:
name lifecycle slicers workers _status ex_id job_id _created _updated
---------------------- ---------- ------- ------- ---------- ------------------------------------ ------------------------------------ ------------------------ ------------------------
kafka-to-es-json-error persistent 1 1 terminated 4650a336-107b-45cc-b74f-00025f43b486 5f3a4bd7-1a9a-4133-bef8-b5d9a129cd79 2024-09-17T16:17:18.591Z 2024-09-17T16:18:28.053Z
This PR makes the following changes:
Bug Fixes
kubernetesV2
and to NOT allow non-zero exit codes when inkubernetes
v2.3.0
where the kubernetes V1 execution controller exit process is unintentionally changed. This PR restores it's original functionality while allowingkubernetesV2
to gain its intended functionality.3755