moinul-hossain-dhrubo / quran_hadith_verse_finder

Find related quran and hadith verses of your interest
https://quran-hadith-verse-finder.onrender.com/
Apache License 2.0
2 stars 0 forks source link
flask-application nlp retrieval semantic-search webscraping

Quran Hadith Verse Finder

Context:

In Islam, it is recommended for every Muslim to live life based on the Quran and Hadith. The Quran, also romanized as Qur'an or Koran, is the central religious text of Islam, believed by Muslims to be a direct revelation from God (Allah). The Hadith is a source of religious and moral guidance known as Sunnah, ranking second in authority only to the Quran.

People who can memorize every verse of the Quran are known as Hafiz. However, it is not feasible for every Muslim to memorize all the verses related to a particular topic. While search engines like Google can help address this issue, they often provide too many links and websites, making it difficult to find authentic information. It would be more efficient to have a tool that provides related verses in one place with just one click.

Objective:

The goal of this project is to build an application that retrieves Quran and Hadith verses based on user queries with just one click. This project includes data collection from authentic sources, data cleaning, embedding, deployment, and API integration.

Data Collection:

This project includes data collection from authentic sources that are

  1. quran.com
  2. sunnah.com

Only Four Hadith books were scraped that are Sahih al-Bukhari, Sahih Muslim, Sunan an-Nasa'i and Sunan Abi Dawud. Scraping codes/scripts can be found in scraper folder.

Data Preparation :

The data was thoroughly checked before creating the vector database. Each verse was concatenated with its corresponding surah or hadith so that the retrieval can include the name of the surah or hadith book. The cleaned data can be found in the data folder. The data preparation process is documented in notebooks/Quran_Hadith_Semantic_Search.ipynb.

Vector Database creation:

The model used to embed the dataset is sentence-transformers/all-MiniLM-L6-v2. Model documentation can be found here. This model is specialized for creating semantic sarch application with its sentence embedding capability. Seaprate embeddings/vector databases were created for Quran and four Hadith books combined. The embeddings/ vector database can be found in models directory.

HuggingFace Deployment:

Two separate spaces were created for finding Quran and Hadith verses. The models along with the entire pipeline has been deployed in huggingface spaces using gradio interface. You can visit the spaces via Quran_verse_finder and Hadith_verse_finder

Web Deployment using API integration:

Deployed a Flask App built to take user input/query and show the similar verses separately for Quran and Hadith that are related to the query. Check the flask branch or click here. Live website can be found here.

quran_hadith_app_home

quran_hadith_app_result