Failing to load dependencies & chromadb error #646

l4b4r4b4b4 commented 1 year ago

I am running LangChain with Next.js 13 in a Docker container. While I am able to ingest PDFs from the project root outside Docker but into ChromaDB running in another Docker container, the whole process fails when I am trying to do that in a NextJS api route!

First it complained to not be able to load the needed dependencies (d3-dsv, mammoth, epub2, pupeteer, srt-parser-2, cohere-ai, @dqbd/tiktoken and hnswlib-node) now it actually does ingest the docs, connects to ChromaDB and loads them successfully, it still breaks off saying:

Error in /api/document/upload: TypeError: Cannot read properties of undefined (reading 'data')
    at /usr/src/app/node_modules/.pnpm/chromadb@1.3.1/node_modules/chromadb/dist/main/index.js:136:29
   at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
   at async Collection.add (/usr/src/app/node_modules/.pnpm/chromadb@1.3.1/node_modules/chromadb/dist/main/index.js:124:26)
   at async Chroma.addVectors (webpack-internal:///(api)/./node_modules/.pnpm/langchain@0.0.48_@dqbd+tiktoken@1.0.3_chromadb@1.3.1_cohere-ai@6.2.0_d3-dsv@3.0.1_epub2@3.0.1_tnsy5tx2rwt6ouwh4fpahcqrzy/node_modules/langchain/dist/vectorstores/chroma.js:85:9)
   at async Chroma.addDocuments (webpack-internal:///(api)/./node_modules/.pnpm/langchain@0.0.48_@dqbd+tiktoken@1.0.3_chromadb@1.3.1_cohere-ai@6.2.0_d3-dsv@3.0.1_epub2@3.0.1_tnsy5tx2rwt6ouwh4fpahcqrzy/node_modules/langchain/dist/vectorstores/chroma.js:46:9)
  at async Chroma.fromDocuments (webpack-internal:///(api)/./node_modules/.pnpm/langchain@0.0.48_@dqbd+tiktoken@1.0.3_chromadb@1.3.1_cohere-ai@6.2.0_d3-dsv@3.0.1_epub2@3.0.1_tnsy5tx2rwt6ouwh4fpahcqrzy/node_modules/langchain/dist/vectorstores/chroma.js:127:9)
  at async upload (webpack-internal:///(api)/./pages/api/document/upload.ts:42:29)

General Setup:

  1. pnpm
  2. node v19
  3. NextJS 13:canary-32

My API route:

// pages/api/document/upload.ts (or .js)

import { NextApiRequest, NextApiResponse } from "next";
import formidable, { File } from "formidable";
import { CustomPDFLoader } from "@/lib/langchain/customPDFLoader";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "langchain/embeddings";
import { Chroma } from "langchain/vectorstores";

const upload = async (req: NextApiRequest, res: NextApiResponse) => {
  if (req.method !== "POST") {
    res.status(405).json({ error: "Method not allowed" });

  try {
    const data = await parseFormData(req);
    const files = data.files;
    // Process the uploaded files (e.g., save them to a storage service or the filesystem)
    // ...
    const pdfFilePath = files[0].filepath;
    const fileName = files[0].newFilename;
    const loader = new CustomPDFLoader(pdfFilePath);

    const rawDoc = await loader.load();
    // console.log(files[0]);
    const textSplitter = new RecursiveCharacterTextSplitter({
      chunkSize: 1000,
      chunkOverlap: 200,

    const doc = await textSplitter.splitDocuments(rawDoc);
    const vectorStore = await Chroma.fromDocuments(
      new OpenAIEmbeddings({
        openAIApiKey: process.env.OPENAI_API_KEY!,
        collectionName: fileName,
        url: "http://chromadb:8000",
    res.status(200).json({ message: "Upload successful" });
  } catch (error) {
    console.error("Error in /api/document/upload:", error);
    res.status(500).json({ error: "Failed to upload files" });

const parseFormData = (req: NextApiRequest) => {
  return new Promise<{ files: File[] }>((resolve, reject) => {
    const form = new formidable.IncomingForm();

    form.parse(req, (err, _fields, files) => {
      if (err) {

      // Ensure files.files is an array
      const uploadedFiles = Array.isArray(files.files)
        ? files.files
        : [files.files];
      resolve({ files: uploadedFiles });

export default upload;

export const config = {
  api: {
    bodyParser: false,


# Creates a layer from node:19-buster image.
FROM node:19-buster

RUN apt-get update && \
    apt-get install -y build-essential libcairo2-dev libpango1.0-dev libjpeg-dev libgif-dev librsvg2-dev python

# Install pnpm
RUN npm install -g pnpm

# Creates directories
RUN mkdir -p /usr/src/app

# Sets an environment variable

# Sets the working directory for any RUN, CMD, ENTRYPOINT, COPY, and ADD commands
WORKDIR /usr/src/app

# Copy new files or directories into the filesystem of the container
COPY pnpm-lock.yaml /usr/src/app
COPY package.json /usr/src/app

# Execute commands in a new layer on top of the current image and commit the results
RUN pnpm install

# Copy new files or directories into the filesystem of the container
COPY . /usr/src/app

# Informs container runtime that the container listens on the specified network ports at runtime

# Allows you to configure a container that will run as an executable
ENTRYPOINT ["pnpm", "run", "dev"]
nfcampos commented 1 year ago

That sounds like an issue inside the Chroma SDK, maybe @jeffchuber might be able to help?

l4b4r4b4b4 commented 1 year ago

hmm I actually dont think so, since it generally has issues resolving the needed dependencies and I only fixed that by manually installing the dependencies into the root project. So I think the remaining problem with ChromaDB is still connected to that. Because Chroma takes the embeddings just fine now: Bildschirmaufzeichnung vom 06.04.2023, 13:50:47.webm

l4b4r4b4b4 commented 1 year ago

I will try to narrow down the problem by running outside docker container and and stripped down project. Maybe something will come out of that!

l4b4r4b4b4 commented 1 year ago

all good. there were two problems, both NOT connected to langchain!

  1. I did not know that 'peer dependencies' have to be installed by hand into the root project.
  2. Is connected to a property of some PDFs. Don't understand why that is the case with some of them, but generally the pipeline works :)