Closed itsthemoon closed 5 months ago
🤖
Hey there @itsthemoon! 👋
I'm Dosu, your friendly neighborhood bot, skilled at helping with bugs, answering questions, and guiding you through the maze of contribution. I'm here to lend you a digital hand while we wait for a human maintainer to join us. Ready to dive into the code together? 🏊♂️💻
Let's rock this! 🎸
Based on the information provided, it seems like the parseOfficeAsync
function from the officeparser
module is trying to create temporary files in a directory that doesn't exist in the AWS Lambda environment. Unfortunately, the officeparser
module doesn't provide an option to specify the directory where temporary files are created.
However, there might be a workaround for this issue. You could try to modify the PPTXLoader
class to change the current working directory to /tmp
before calling the parseOfficeAsync
function, and then change it back after the function call. Here's an example of how you could do this:
import { parseOfficeAsync } from "officeparser";
import { chdir } from "process";
export class PPTXLoader extends BufferLoader {
public async parse(
raw: Buffer,
metadata: Document["metadata"]
): Promise<Document[]> {
const originalCwd = process.cwd();
chdir("/tmp");
const pptx = await parseOfficeAsync(raw, { outputErrorToConsole: true });
chdir(originalCwd);
if (!pptx) return [];
return [
new Document({
pageContent: pptx,
metadata,
}),
];
}
}
Please note that this is just a workaround and might not work in all cases. It's also important to handle any errors that might occur when changing the current working directory. If this doesn't solve your issue, please provide more details about your AWS Lambda setup and how you're using the PPTXLoader
class.
I hope this helps! If you have any other questions, feel free to ask.
Best regards, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot I dont think i can edit the PPTX loader because it is part of the langchain class that I am using.
🤖
Based on the information provided, it appears that the PPTXLoader in the langchainjs codebase does not have a method or function that allows for the modification of the directory where temporary files are written. This is confirmed by the source code in langchain/src/document_loaders/fs/pptx.ts.
However, there might be a workaround. You could potentially modify the environment variable TMPDIR
in your AWS Lambda function to point to the /tmp
directory. This is because Node.js uses the os.tmpdir()
method to get the default directory for temporary files, and this method returns the path of the directory from the TMPDIR
environment variable.
Here's an example of how you could set this environment variable in your Lambda function:
process.env['TMPDIR'] = '/tmp';
Please note that this is a workaround and might not work in all cases. It would be best if the library provided a way to configure the temporary directory, but as per the current state of the codebase, it does not seem to be possible.
If this workaround does not solve your issue, please provide more details about your implementation and I'll do my best to assist you further.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@itsthemoon I had no choice but to fork to solve the problem. here my coude https://github.com/junonya/officeParser/blob/6d37667ec1de7b41323334d977e5a4e3d17dbe31/officeParser.js#L22
I am trying to use the PPTX loader (js version) inside of an AWS lambda that I've created. I am getting the following errors:
2023-12-16T15:31:51.084Z 1e4c8769-cbe8-44d2-a3f9-100dc59cb705 ERROR [OfficeParser]: Error: ENOENT: no such file or directory, mkdir 'officeParserTemp/tempfiles'
2023-12-16T15:31:51.084Z 1e4c8769-cbe8-44d2-a3f9-100dc59cb705 ERROR Error processing file: [OfficeParser]: Error: ENOENT: no such file or directory, mkdir 'officeParserTemp/tempfiles'
This is because in lambda you can only write to the /tmp folder, which works fine for every other loader I am using (docx, txt, pdf). However, the officeparser which is what the PPTX loader uses, is trying to write to a different location, which I don't have access to. Does anyone know a solution to this?