terascope / teraslice

Scalable data processing pipelines in JavaScript
https://terascope.github.io/teraslice/
Apache License 2.0
50 stars 13 forks source link

Large asset Ex OOM fix when in s3 asset mode #3598

Closed sotojn closed 5 months ago

sotojn commented 5 months ago

This PR makes the following changes:

Ref to issue #3595

godber commented 5 months ago

Using this branch with ES backed asset storage, I started the cluster with this command:

yarn k8s:minio

I uploaded the asset fine:

earl assets deploy local -f autoload/common_processors-v0.13.1-node-18-bundle.zip
Asset posted to local: eabe46a623bc55886e1e81f3eefe74754a903fd1

When trying to run in ES mode I get the following error when the job is registered:

earl tjm register local examples/jobs/data_generator.json
Error Failure to get assets, caused by TSError: index_not_found_exception
    at _errorHandlerFn (/app/source/packages/elasticsearch-api/index.js:840:21)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Caused by: ResponseError: index_not_found_exception
    at IncomingMessage.<anonymous> (/app/source/node_modules/elasticsearch6/lib/Transport.js:310:25)
    at IncomingMessage.emit (node:events:529:35)
    at endReadableNT (node:internal/streams/readable:1400:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
If running out of memory, try consider increasing the memory allocation for the process by adding/modifying the "memory_execution_controller" or "resources_limits_memory" (for workers) field in the job file. registering job Data Generator on http://localhost:5678