harshankur / officeParser

A Node.js library to parse text out of any office file. Currently supports docx, pptx, xlsx and odt, odp, ods..
MIT License
123 stars 17 forks source link

Errors with bulk processing #33

Open b-hexsoul opened 3 months ago

b-hexsoul commented 3 months ago

I am trying to process hundreds of documents.

Getting the following errors:

ERROR parseDocument [OfficeParser]: Error: Refusing to create a directory outside the output path. ERROR parseDocument [OfficeParser]: Error: ENOENT: no such file or directory, lstat '/src/officeParserTemp/171639676809200209.docx' ERROR parseDocument [OfficeParser]: Error: EEXIST: file already exists, mkdir '/src/officeParserTemp/171639676571100139.docx'

[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "UnknownErrorException".] {

MiNickel commented 2 months ago

Did you find out what the issue was? Currently experiencing the same problem

MiNickel commented 2 months ago

Setting the flag to preserve the temporary files fixed it for me!

WarrenMfg commented 3 weeks ago

Similar fix for me. I replaced this line:

rimraf(internalConfig.tempFilesLocation, rimrafErr => consoleError(rimrafErr, internalConfig.outputErrorToConsole));

with this line:

fs.rmSync(filepath);

It keeps the officeParserTemp and nested tempfiles directories though. But that can also be changed.

WarrenMfg commented 3 weeks ago

@b-hexsoul are you using worker_threads by chance? I had to comment out getNewFileName due to duplicate file names, then prefix the file names with guids myself.