Open mcgear opened 7 months ago
🤖
The error you're encountering is likely due to the fact that Deno does not support Node.js built-in modules like 'fs', 'path', and 'url'. Deno has its own set of standard libraries and does not use Node.js's 'require' system.
To resolve this issue, you should use Deno's built-in functions to replace the Node.js modules. Here's how you can modify your code:
import { test, expect } from "@jest/globals";
import { PDFLoader } from "../fs/pdf.js";
import { readFileSync } from "https://deno.land/std/fs/mod.ts";
import { resolve, dirname } from "https://deno.land/std/path/mod.ts";
import { fromFileUrl } from "https://deno.land/std/path/mod.ts";
test("Test PDF loader from blob", async () => {
const filePath = resolve(
dirname(fromFileUrl(import.meta.url)),
"./example_data/1706.03762.pdf"
);
const loader = new PDFLoader(
new Blob([readFileSync(filePath)], {
type: "application/pdf",
})
);
const docs = await loader.load();
expect(docs.length).toBe(15);
expect(docs[0].pageContent).toContain("Attention Is All You Need");
expect(docs[0].metadata).toMatchInlineSnapshot(`
{
"blobType": "application/pdf",
"loc": {
"pageNumber": 1,
},
"pdf": {
"info": {
"Author": "",
"CreationDate": "D:20171207010315Z",
"Creator": "LaTeX with hyperref package",
"IsAcroFormPresent": false,
"IsXFAPresent": false,
"Keywords": "",
"ModDate": "D:20171207010315Z",
"PDFFormatVersion": "1.5",
"Producer": "pdfTeX-1.40.17",
"Subject": "",
"Title": "",
"Trapped": {
"name": "False",
},
},
"metadata": null,
"totalPages": 15,
"version": "1.10.100",
},
"source": "blob",
}
`);
});
In this code, I've replaced the Node.js 'fs', 'path', and 'url' modules with equivalent functions from Deno's standard library. The 'readFileSync' function is used instead of 'fs.readFile', and 'resolve' and 'dirname' functions are used instead of 'path.resolve' and 'path.dirname'. The 'fromFileUrl' function is used to convert the URL to a file path.
You can find more information about this in the langchain/src/document_loaders/tests/pdf-blob.test.ts file in the langchainjs repository.
As for your question about whether the langchainjs framework supports DenoJS, it does. The evidence for this is in the provided context, specifically in the deno.json file. This file is used to map imports in Deno, and it includes several langchain modules, such as "@langchain/community/", "@langchain/openai", "@langchain/core/", and "@langchain/pinecone". This indicates that these modules can be imported and used in a Deno environment. You can find more information about this in the deno.json file in the langchainjs repository.
I had success in a Deno notebook like this:
// Peer dependency
import * as parse from "pdf-parse";
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
const loader = new PDFLoader("./static/docs/MachineLearning-Lecture01.pdf");
const rawCS229Docs = await loader.load();
https://github.com/jacoblee93/dlai-langchainjs/blob/main/02_loading_and_preparing_data.ipynb
Where my import map in deno.json
looked like this:
https://github.com/jacoblee93/dlai-langchainjs/blob/main/deno.json
Could you try that exact syntax and let me know?
I tried that "exact" code... The only difference i think is that the PDFLoader is used within a test (not sure if maybe that is the issue?). But i really need it working in a test first.
I am on Deno version: Deno 1.40.4
I updated my commits so that my code is reflected in the repos i provided: https://github.com/fathym-deno/everything-as-code/blob/feature/thinky/deno.jsonc https://github.com/fathym-deno/everything-as-code/blob/feature/thinky/tests/workshop/bench.ts
I did try something with a direct call, outside of a test, and i get an error about not finding the file... so that is promising as it does get past the issue with not resolving pdf-parse https://github.com/fathym-deno/everything-as-code/blob/feature/thinky/tests/workshop/bench-direct.ts
Just pushed a fix to bench direct, and it does work outside of the deno test
deno run -A ./tests/workshop/bench-direct.ts
Working to bring the communities together, i added a bug in deno for this as well: https://github.com/denoland/deno/issues/22399
Thank you for diving in here! Will keep an eye on that issue.
Thank you for the help. I'll see if the code you provided works from a Deno fresh api tomorrow, at least then i can move forward, and if it doesn't work there, then we'll have some more knowledge to go by.
I wanted to provide a little more information as we work through this. I wanted to try and remove the importMaps for deno, in order to be able to follow the needs for a library and was able to trim down the code so that there were no import maps and just the following:
import * as parse from "npm:pdf-parse";
import { PDFLoader } from "npm:langchain/document_loaders/fs/pdf";
const loader = new PDFLoader("./training/azure/data-explorer/azure-data-explorer.pdf", {
splitPages: false,
});
const docs = await loader.load();
console.log(docs);
That seems to work just fine when run as a deno run command, but not in a test. Just trying to help us remove some variables from the possible issue. The same within a test fails:
import * as parse from "npm:pdf-parse";
import { PDFLoader } from "npm:langchain/document_loaders/fs/pdf";
Deno.test('Workshop Bench', async (t) => {
await t.step('Retrieval Test', async () => {
const loader = new PDFLoader("./training/azure/data-explorer/azure-data-explorer.pdf", {
splitPages: false,
});
const docs = await loader.load();
console.log(docs);
});
});
I did try to move the test code out of a step and into the root test, and that still failed.
In the end, we don't need the extra import maps to get it working, clearly something to do with the context of running it within Deno.test.
Hi, @mcgear,
I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, you encountered an error when trying to use PDFLoader in a DenoJS test, and after receiving detailed responses and suggestions from dosubot and jacoblee93, you continued to experiment with different code configurations to troubleshoot the issue. The issue has been resolved with my explanation that Deno does not support Node.js built-in modules and suggested using Deno's standard library functions instead. Additionally, jacoblee93 provided a successful code snippet using PDFLoader in a Deno notebook.
Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!
Sorry for letting this go stale, but i finally got back into some more RAG stuff that i am trying to test... And can confirm that i am still getting this issue.
Sorry yeah still need to have a look at this!
I am trying to use PDFLoader from a test in DenoJS, to start getting a feel for it, but am stuck almos immediately.
Getting the following error when running the test
Test is here: https://github.com/fathym-deno/everything-as-code/blob/feature/thinky/tests/workshop/bench.ts
Dependencies defined here: https://github.com/fathym-deno/everything-as-code/blob/feature/thinky/tests/test.deps.ts
Trying to import pdf-parse instead of npm install (it fails even if i run
npm install pdf-parse
): import "npm:pdf-parse";How can i get this working in deno?