upstash / wikipedia-semantic-search

Semantic Search on Wikipedia with Upstash Vector
https://wikipedia-semantic-search.vercel.app/
MIT License
405 stars 33 forks source link
ai search semantic vector vector-database

Indexing Millions of Wikipedia Articles With Upstash Vector

This repository contains the code and documentation for our project on indexing millions of Wikipedia articles using Upstash Vector, as described in our blog post.

Project Overview

We've created a semantic search engine and Upstash RAG Chat SDK using Wikipedia data to demonstrate the capabilities of Upstash Vector and RAG Chat SDK. The project involves:

  1. Preparing and embedding Wikipedia articles
  2. Indexing the vectors using Upstash Vector
  3. Building a Wikipedia semantic search engine
  4. Implementing a RAG chatbot

Key Features

Technologies Used

Development

To run the project locally, follow these steps:

  1. Go to Upstash Console to manage your databases:
    • Create a new Vector database with embedding model support. You can choose the BGE-M3 model for multilingual support.
    • Create a new Redis database for storing chat sessions.
    • Copy the credentials for both Redis and Vector. Also copy the QStash credentials for using the upstash hosted LLM models.

Put the credentials in a .env file in the root of the project. Your .env file should look like this:

UPSTASH_VECTOR_REST_URL=
UPSTASH_VECTOR_REST_TOKEN=

UPSTASH_REDIS_REST_TOKEN=
UPSTASH_REDIS_REST_URL=

QSTASH_TOKEN=
  1. Populate your Vector index.

This project uses namespaces to store articles in different languages. So you have to upsert the vectors in the correct namespace. For english, upsert your vectors into the en namespace.

  1. Install the dependencies:
pnpm install
  1. Run the development server:
pnpm dev

Contributing

We welcome contributions to improve this project. Please feel free to submit issues or pull requests.

Acknowledgements

Contact

For any questions or feedback about the project or Upstash Vector, please reach out to us at (add contact information).

Check out our live demo to see the project in action!