huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
242 stars 80 forks source link

Handle file not found errors on PDFSegmentation #6827

Closed txau closed 5 months ago

txau commented 5 months ago

When for whatever cause a file is missing in the filesystem but present in the files database, PDFSegmentator fails without handling the error.

cc @gabriel-piles @fnocetti

2:38 PM: 2024-05-27T14:38:30.951Z [localhost] NoSuchKey: UnknownError at de_NoSuchKeyRes (/opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:4809:21) at de_CommandError (/opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:4747:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-signing/dist-cjs/index.js:225:18 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-flexible-checksums/dist-cjs/index.js:173:18 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:97:20 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:120:14 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:33:22 at async S3Storage.get (/opt/uwazi/cores/core-1.169.0-rc1/app/api/files/S3Storage.js:37:22) at async readFromS3 (/opt/uwazi/cores/core-1.169.0-rc1/app/api/files/storage.js:58:20) at async Object.fileContents (/opt/uwazi/cores/core-1.169.0-rc1/app/api/files/storage.js:70:27) at async PDFSegmentation.segmentOnePdf (/opt/uwazi/cores/core-1.169.0-rc1/app/api/services/pdfsegmentation/PDFSegmentation.js:43:29) at async /opt/uwazi/cores/core-1.169.0-rc1/app/api/services/pdfsegmentation/PDFSegmentation.js:114:17 at async /opt/uwazi/cores/core-1.169.0-rc1/app/api/services/pdfsegmentation/PDFSegmentation.js:103:13 original error: { "name": "NoSuchKey", "$fault": "client", "$metadata": { "httpStatusCode": 404, "requestId": "tx00000e285947cd59c5bf9-0066549ae6-5957e6d-default", "attempts": 1, "totalRetryDelay": 0 }, "Code": "NoSuchKey", "BucketName": "uwazi-staging", "RequestId": "tx00000e285947cd59c5bf9-0066549ae6-5957e6d-default", "HostId": "5957e6d-default-default", "message": "UnknownError" }

txau commented 5 months ago

fixed