llm-tools / embedJs

A NodeJS RAG framework to easily work with LLMs and embeddings
https://llm-tools.mintlify.app/get-started/introduction
Apache License 2.0
333 stars 40 forks source link

Getting Started code not runnable #162

Open ericljx2020-gmail opened 2 weeks ago

ericljx2020-gmail commented 2 weeks ago

🐛 Describe the bug

First, SIMPLE_MODEL is not properly imported in the given starter code. Second, I'm having an issue to run addLoader function in paid model section. The error message is shown below.

Argument of type 'WebLoader' is not assignable to parameter of type 'BaseLoader<Record<string, string | number | boolean>, Record<string, unknown>>'. The types returned by 'getUnfilteredChunks().next(...)' are incompatible between these types. Type 'Promise<IteratorResult<{ pageContent: string; metadata: { type: "WebLoader"; source: string | undefined; }; }, void>>' is not assignable to type 'Promise<IteratorResult<UnfilteredLoaderChunk<Record<string, string | number | boolean>>, void>>'. Type 'IteratorResult<{ pageContent: string; metadata: { type: "WebLoader"; source: string | undefined; }; }, void>' is not assignable to type 'IteratorResult<UnfilteredLoaderChunk<Record<string, string | number | boolean>>, void>'. Type 'IteratorYieldResult<{ pageContent: string; metadata: { type: "WebLoader"; source: string | undefined; }; }>' is not assignable to type 'IteratorResult<UnfilteredLoaderChunk<Record<string, string | number | boolean>>, void>'. Type 'IteratorYieldResult<{ pageContent: string; metadata: { type: "WebLoader"; source: string | undefined; }; }>' is not assignable to type 'IteratorYieldResult<UnfilteredLoaderChunk<Record<string, string | number | boolean>>>'. Type '{ pageContent: string; metadata: { type: "WebLoader"; source: string | undefined; }; }' is not assignable to type 'UnfilteredLoaderChunk<Record<string, string | number | boolean>>'. Types of property 'metadata' are incompatible. Type '{ type: "WebLoader"; source: string | undefined; }' is not assignable to type 'LoaderMetadata<Record<string, string | number | boolean>>'. Type '{ type: "WebLoader"; source: string | undefined; }' is not assignable to type 'Record<string, string | number | boolean>'. Property 'source' is incompatible with index signature. Type 'string | undefined' is not assignable to type 'string | number | boolean'. Type 'undefined' is not assignable to type 'string | number | boolean'.ts(2345)

adhityan commented 1 week ago

Thank you for pointing out the missing import in the documentation, this will be addressed in the next release. I just tried the same example from quick-start with that import added in and it seems to be working fine.

Could you share the versions of the library and dependencies you are using?

Code tried -

import 'dotenv/config';
import { RAGApplicationBuilder, SIMPLE_MODELS } from '@llm-tools/embedjs';
import { OpenAiEmbeddings } from '@llm-tools/embedjs-openai';
import { WebLoader } from '@llm-tools/embedjs-loader-web';
import { HNSWDb } from '@llm-tools/embedjs-hnswlib';

const ragApplication = await new RAGApplicationBuilder()
    .setModel(SIMPLE_MODELS.OPENAI_GPT4_O)
    .setEmbeddingModel(new OpenAiEmbeddings())
    .setVectorDatabase(new HNSWDb())
    .build();

await ragApplication.addLoader(new WebLoader({ urlOrContent: 'https://www.forbes.com/profile/elon-musk' }));
await ragApplication.addLoader(new WebLoader({ urlOrContent: 'https://en.wikipedia.org/wiki/Elon_Musk' }));

await ragApplication.query('What is the net worth of Elon Musk today?');

Output (with debug logs enabled) -

2024-11-13T14:09:03.640Z embedjs:core Using system query template - "You are a helpful human like chat bot. Use relevant provided context and chat history to answer the query at the end. Answer in full. If you don't know the answer, just say that you don't know, don't try to make up an answer. Do not use words like context or training data when responding. You can say you do not have all the information but do not indicate that you are not a reliable source."

2024-11-13T14:09:03.641Z embedjs:core Dynamically imported OpenAi

2024-11-13T14:09:03.641Z embedjs:core Initialized LLM class

2024-11-13T14:09:03.643Z embedjs:core Initialized vector database

2024-11-13T14:09:03.643Z embedjs:core Initialized cache
2024-11-13T14:09:03.643Z embedjs:core Initialized pre-loaders

2024-11-13T14:09:03.644Z embedjs:loader:BaseLoader New loader class initalized with key WebLoader_8cf46026cabf9b05394a2658bd1fe890

2024-11-13T14:09:03.644Z embedjs:core Exploring loader WebLoader_8cf46026cabf9b05394a2658bd1fe890

2024-11-13T14:09:03.644Z embedjs:core Chunks generator received WebLoader_8cf46026cabf9b05394a2658bd1fe890

2024-11-13T14:09:04.122Z embedjs:util:getSafe URL 'https://www.forbes.com/profile/elon-musk' returned status code 200

2024-11-13T14:09:04.162Z embedjs:core Processing batch (size 4) for loader WebLoader_8cf46026cabf9b05394a2658bd1fe890

2024-11-13T14:09:07.454Z embedjs:core Batch embeddings (size 4) obtained for loader WebLoader_8cf46026cabf9b05394a2658bd1fe890
2024-11-13T14:09:07.455Z embedjs:core Inserting chunks for loader WebLoader_8cf46026cabf9b05394a2658bd1fe890 to vectorDatabase

2024-11-13T14:09:07.455Z embedjs:core Add loader completed with 4 new entries for WebLoader_8cf46026cabf9b05394a2658bd1fe890
2024-11-13T14:09:07.455Z embedjs:core Add loader WebLoader_8cf46026cabf9b05394a2658bd1fe890 wrap up done

2024-11-13T14:09:07.455Z embedjs:loader:BaseLoader New loader class initalized with key WebLoader_1eab8dd1ffa92906f7fc839862871ca5
2024-11-13T14:09:07.455Z embedjs:core Exploring loader WebLoader_1eab8dd1ffa92906f7fc839862871ca5
2024-11-13T14:09:07.455Z embedjs:core Chunks generator received WebLoader_1eab8dd1ffa92906f7fc839862871ca5

2024-11-13T14:09:07.555Z embedjs:util:getSafe URL 'https://en.wikipedia.org/wiki/Elon_Musk' returned status code 200

2024-11-13T14:09:07.844Z embedjs:core Processing batch (size 170) for loader WebLoader_1eab8dd1ffa92906f7fc839862871ca5

2024-11-13T14:09:12.128Z embedjs:core Batch embeddings (size 170) obtained for loader WebLoader_1eab8dd1ffa92906f7fc839862871ca5
2024-11-13T14:09:12.128Z embedjs:core Inserting chunks for loader WebLoader_1eab8dd1ffa92906f7fc839862871ca5 to vectorDatabase

2024-11-13T14:09:12.146Z embedjs:core Add loader completed with 170 new entries for WebLoader_1eab8dd1ffa92906f7fc839862871ca5
2024-11-13T14:09:12.146Z embedjs:core Add loader WebLoader_1eab8dd1ffa92906f7fc839862871ca5 wrap up done

2024-11-13T14:09:12.352Z embedjs:core Query resulted in 40 chunks before filteration...
2024-11-13T14:09:12.352Z embedjs:core Query resulted in 30 chunks after filteration; chunks from 1 unique sources.
2024-11-13T14:09:12.353Z embedjs:model:BaseModel Conversation with id 'default' is new
2024-11-13T14:09:12.353Z embedjs:model:BaseModel 0 history entries found for conversationId 'default'

2024-11-13T14:09:12.353Z embedjs:model:OpenAi Executing OpenAI model with prompt - What is the net worth of Elon Musk today??

2024-11-13T14:09:14.902Z embedjs:model:OpenAi OpenAI response - AIMessage {
  "id": "chatcmpl-AT8LR1CBsiSBhpf87OGaKI5ThoXhA",
  "content": "As of the latest estimates, Elon Musk's net worth is approximately $250 billion. However, this figure can fluctuate due to changes in the stock market and the valuation of his companies.",
  "additional_kwargs": {},
  "response_metadata": {
    "tokenUsage": {
      "promptTokens": 16763,
      "completionTokens": 37,
      "totalTokens": 16800
    },
    "finish_reason": "stop",
    "usage": {
      "prompt_tokens": 16763,
      "completion_tokens": 37,
      "total_tokens": 16800,
      "prompt_tokens_details": {
        "cached_tokens": 0,
        "audio_tokens": 0
      },
      "completion_tokens_details": {
        "reasoning_tokens": 0,
        "audio_tokens": 0,
        "accepted_prediction_tokens": 0,
        "rejected_prediction_tokens": 0
      }
    },
    "system_fingerprint": "fp_159d8341cc"
  },
  "tool_calls": [],
  "invalid_tool_calls": [],
  "usage_metadata": {
    "output_tokens": 37,
    "input_tokens": 16763,
    "total_tokens": 16800,
    "input_token_details": {
      "audio": 0,
      "cache_read": 0
    },
    "output_token_details": {
      "audio": 0,
      "reasoning": 0
    }
  }
}

Note: You need to await the add loader commands for it to complete. They are async functions.