openaddresses / batch

OpenAddresses/Machine based AWS Batch based ETL Processing
https://batch.openaddresses.io/
MIT License
6 stars 5 forks source link

Use the S3 JavaScript SDK's upload function instead of putObject #332

Closed iandees closed 1 year ago

iandees commented 1 year ago

Fixes https://github.com/openaddresses/openaddresses/issues/6376

Large objects (like the collections) were failing to upload with errors like:

EntityTooLarge: Your proposed upload exceeds the maximum allowed object size.
--
at Request.extractError (/usr/local/src/batch/node_modules/aws-sdk/lib/services/s3.js:711:35)
at Request.callListeners (/usr/local/src/batch/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/usr/local/src/batch/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/usr/local/src/batch/node_modules/aws-sdk/lib/request.js:686:14)
at Request.transition (/usr/local/src/batch/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/usr/local/src/batch/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /usr/local/src/batch/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/usr/local/src/batch/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/usr/local/src/batch/node_modules/aws-sdk/lib/request.js:688:12)
at Request.callListeners (/usr/local/src/batch/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
code: 'EntityTooLarge',
region: null,
time: 2023-04-16T16:13:44.479Z,
requestId: null,
extendedRequestId: undefined,
cfId: undefined,
statusCode: 400,
retryable: false,
retryDelay: 13.173899584670723
}

Instead of using putObject(), we should use the upload() API, which will break the stream into chunks and perform several multipart PUT operations, allowing us to avoid the 5GB PutObject limit.