Closed glorat closed 1 week ago
To ensure that the pageContent
property is always set to a string, even when it is undefined, you can modify the constructor of the Document
class to default pageContent
to an empty string if it is not provided. Here is the corrected code:
export class Document<
// eslint-disable-next-line @typescript-eslint/no-explicit-any
Metadata extends Record<string, any> = Record<string, any>
> implements DocumentInput, DocumentInterface
{
pageContent: string;
metadata: Metadata;
constructor(fields: DocumentInput<Metadata>) {
this.pageContent = fields.pageContent !== undefined
? fields.pageContent.toString()
: "";
this.metadata = fields.metadata ?? ({} as Metadata);
}
}
This change ensures that pageContent
will always be a string, defaulting to an empty string if it is not provided [1].
To continue talking to Dosu, mention @dosu.
@dosu I agree with you and the proposed fix. Let's also have a unit test for it. How about this:
import { test, expect } from "@jest/globals";
import {Document} from "../documents/document.js";
test("Document", async () => {
const doc = new Document({pageContent: ''})
expect (doc.pageContent).toEqual('')
});
Your proposed unit test looks good. It effectively checks that the pageContent
is set to an empty string when an empty string is provided. You might also want to add a test case to ensure that pageContent
defaults to an empty string when it is not provided. Here is an extended version of your test:
import { test, expect } from "@jest/globals";
import { Document } from "../documents/document.js";
test("Document should handle empty pageContent", () => {
const doc = new Document({ pageContent: '' });
expect(doc.pageContent).toEqual('');
});
test("Document should default pageContent to an empty string when not provided", () => {
const doc = new Document({});
expect(doc.pageContent).toEqual('');
});
This ensures that both scenarios are covered.
To continue talking to Dosu, mention @dosu.
Dosu's second test not applicable since DocumentInput does not allow {}
Taken the liberty of making a PR
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
AssertionError: expected undefined to strictly equal '' Expected : Actual :undefined
Description
pageContent should be set to '', as mandated by its type contract
string
(and notstring|undefined
)The bug is visible in the code https://github.com/langchain-ai/langchainjs/blob/main/langchain-core/src/documents/document.ts
Line 32 should check field.pageContent is undefined, not just truthy, otherwise it uses this.pageContent which is undefined and breaks the contract that pageContent must be a string. Thus in any event, pageContent needs to be initialised to something to fulfill the type contract
System Info
This will happen on any environment