PhotoRAG is a fullstack Next.JS image search application that leverages Azure AI and infrastructure to implement a Retrieval-Augmented Generation (RAG) system for photographs. This project showcases the power of combining vector embeddings, similarity search, and large language models to create an intelligent and efficient image retrieval system.
Data Sources
We seeded the database using a collection of our own photographs. You can view all of the available images in the database at https://rag.photomuse.ai/gallery, and click the Load Gallery button.
Data Ingestion
When we uploaded the images, we utilized the Azure AI Computer Vision API.
computervision/imageanalysis:analyze: Was utilized to generate a caption and tags from the supplied image.
We then provided the tags and computer vision caption to GPT-4o, and prompted it to create a more thorough image description that would improve search result accuracy.
The tags were stored as an array in our Azure Postgresql Server database.
The image description was stored as a string, and we then used
computervision/retrieval:vectorizeImage for the image, and
computervision/retrieval:vectorizeText for the description, storing both vectors in the same images postgres table.
How It Works
Image Upload: When an image is uploaded, it's stored in Azure Blob Storage.
Image Analysis: The image is analyzed using Azure Computer Vision to generate tags and captions.
Description Generation: GPT-4 is used to generate a detailed description based on the tags and captions.
Vector Embedding: The description is converted into a vector embedding using Azure OpenAI.
Search: When a user performs a search:
The query is refined using GPT-4 to extract relevant tags and improve the search terms.
The refined query is converted to a vector embedding.
A similarity search is performed using cosine similarity between the query embedding and the stored image embeddings.
Results are ranked based on similarity and tag matches.
Features
Image upload and automatic description generation using Azure Computer Vision and GPT-4
Automatic tagging of images
Vector embedding of image descriptions for efficient similarity search
Natural language querying of the image database
Refined search queries using AI
Confidence scoring and explanations for search results
Technology Stack
Next.js (App Router)
TypeScript
PostgreSQL with pgvector extension
Drizzle ORM
Azure OpenAI API (for GPT-4)
Azure Computer Vision API (for analysis and embeddings)
Project Name
PhotoRAG
Description
PhotoRAG is a fullstack Next.JS image search application that leverages Azure AI and infrastructure to implement a Retrieval-Augmented Generation (RAG) system for photographs. This project showcases the power of combining vector embeddings, similarity search, and large language models to create an intelligent and efficient image retrieval system.
Data Sources
We seeded the database using a collection of our own photographs. You can view all of the available images in the database at https://rag.photomuse.ai/gallery, and click the Load Gallery button.
Data Ingestion
When we uploaded the images, we utilized the Azure AI Computer Vision API.
computervision/imageanalysis:analyze: Was utilized to generate a caption and tags from the supplied image.
We then provided the tags and computer vision caption to GPT-4o, and prompted it to create a more thorough image description that would improve search result accuracy.
The tags were stored as an array in our Azure Postgresql Server database.
The image description was stored as a string, and we then used
How It Works
Features
Technology Stack
Technology & Languages
Project Repository URL
https://github.com/dubscode/photorag
Deployed Endpoint URL
https://rag.photomuse.ai/
Project Video
https://youtu.be/JvHKW363nwo?si=oLjJcrkQbMuQOVXe
Team Members
dubscode