In Islam, it is recommended for every Muslim to live life based on the Quran and Hadith. The Quran, also romanized as Qur'an or Koran, is the central religious text of Islam, believed by Muslims to be a direct revelation from God (Allah). The Hadith is a source of religious and moral guidance known as Sunnah, ranking second in authority only to the Quran.
People who can memorize every verse of the Quran are known as Hafiz. However, it is not feasible for every Muslim to memorize all the verses related to a particular topic. While search engines like Google can help address this issue, they often provide too many links and websites, making it difficult to find authentic information. It would be more efficient to have a tool that provides related verses in one place with just one click.
The goal of this project is to build an application that retrieves Quran and Hadith verses based on user queries with just one click. This project includes data collection from authentic sources, data cleaning, embedding, deployment, and API integration.
This project includes data collection from authentic sources that are
Only Four Hadith books were scraped that are Sahih al-Bukhari, Sahih Muslim, Sunan an-Nasa'i and Sunan Abi Dawud. Scraping codes/scripts can be found in scraper
folder.
The data was thoroughly checked before creating the vector database. Each verse was concatenated with its corresponding surah or hadith so that the retrieval can include the name of the surah or hadith book. The cleaned data can be found in the data
folder. The data preparation process is documented in notebooks/Quran_Hadith_Semantic_Search.ipynb
.
The model used to embed the dataset is sentence-transformers/all-MiniLM-L6-v2
. Model documentation can be found here. This model is specialized for creating semantic sarch application with its sentence embedding capability. Seaprate embeddings/vector databases were created for Quran and four Hadith books combined. The embeddings/ vector database can be found in models
directory.
Two separate spaces were created for finding Quran and Hadith verses. The models along with the entire pipeline has been deployed in huggingface spaces using gradio interface. You can visit the spaces via Quran_verse_finder and Hadith_verse_finder
Deployed a Flask App built to take user input/query and show the similar verses separately for Quran and Hadith that are related to the query. Check the flask branch or click here. Live website can be found here.