Please use pull requests for code contributions instead of forking this repo. We will add you as a collaborator to the repository. 🚀
TL;DR: The Quranic Arabic Corpus, an invaluable linguistic resource, is due for a revamp. We're calling on Linguistics, AI, and Tech volunteers to join us in this exciting journey.
This introduction is designed for a general non-technical audience. For more a more in-depth introduction, see the corpus Wikipedia page, or Dr. Dukes’ PhD thesis: Statistical Parsing by Machine Learning from a Classical Arabic Treebank.
We currently have two domains:
The Quranic Arabic Corpus is the world’s most-visited website for learning the language of the Quran. Similar to Wikipedia, the project is free, without ads, and is supported by user contributions. Also inspired by Wikpiedia, this academic project follows a neutral point of view, backed by reliable sources. For example, the project draws on seven different translations of the Quran to present multiple possible interpretations, and relies on highly-valued Classical Arabic linguistic reference works such as Lisān al-Arab and al-I’rāb al-Mufassal.
The detailed linguistic data in the corpus was generated by artificial intelligence (AI), and then reviewed by human experts to ensure gold-standard accuracy. Users have reported that the website is incredibly useful for anyone wanting to study the Quran in detail. It provides a unique insight into the grammatical structure and vocabulary of one of the world's most studied and revered texts.
The Quranic Arabic Corpus is currently ranked number one on Google for a wide variety of searches including:
However, the website, originally launched in 2009, requires modernization in terms of both web design (there is currently only a desktop version) and linguistic data enhancement.
Breakthroughs of the project include:
The current aims of the project are to improve the corpus and make it more useful and accessible for those interested in studying the Quranic text. We plan to follow through with this initiative by engaging with a variety of volunteers to collectively work towards the goal of enhancing this valuable resource.
Part-of-speech tagging (that explains each word as a noun, verb, etc.), was generated by AI for the entire Quran and was manually verified online by a community of human experts. The AI also generated grammar diagrams. However, the corpus is not complete.
The specific aims of the current project include:
Completing the remaining 50% of grammar diagrams. Although many more people are interested in the semantics of the Quran, the logical next step for the corpus project is to complete the grammatical analysis, as this forms a crucial part of the linguistic structure of the Quran. It's essential to have a thorough understanding of syntax to support semantics.
Modernizing the website to make it easily accessible from mobile devices. The website was started in 2009 before mobile phones were popular and is mainly designed for desktop. Modernizing the website and making it mobile-friendly will significantly increase the website's accessibility and user experience. This would make the resource available to a larger number of people, which is a valuable goal.
Expanding the knowledge graph to enrich the understanding of Quranic concepts and the connections between them. This will provide users with a more comprehensive resource to learn and understand the Quran.
Developing a language learning community. A key objective is to cultivate an engaging and dynamic online community focused on learning the language of the Quran. This community will serve as a platform for learners at various levels to interact, share knowledge, and support each other's learning journeys. This will include discussion forums, study groups, and interactive Q&A sessions. By fostering this sense of community, we hope to make the learning process more collaborative and enriching, contributing to a deeper understanding of the Quran.
Data, APIs, and code Libraries. In line with our commitment to openness and shared learning, we aim to produce and distribute high-quality, standardized datasets, APIs, and code libraries. This resource-rich ecosystem will be freely accessible for individuals and organizations interested in creating new learning applications, educational platforms, and pioneering advanced AI projects in this field.
We are currently looking for three groups of volunteers who are interested in helping us enhance the Quranic AI, complete the missing 50% of grammar diagrams, and rebuild a new version 2.0 of the website that will be much enhanced.
Volunteers who understand the Arabic grammatical science of i’rāb (إعراب) and are willing to assist with completing the missing 50% of grammar diagrams. Engaging volunteers with expertise in Arabic grammatical science will ensure the accuracy of the detailed linguistic data.
Volunteers with experience in AI, including data scientists and machine learning engineers. There are a wide variety of projects that can be done to improve the Quranic AI, and we welcome collaboration with the AI community on new project ideas.
Developers, designers and testers. This is essential for the technical aspects of the project. Ensuring the website is designed and functions effectively for users is critical to the success of the project. We are specifically looking for:
Product designers who can translate our vision into a set of impactful features. Drawing on insights from eLearning platforms, you can help us design a structured, user-friendly, and effective learning journey.
UI/UX designers who can design the layout and screens with a focus on producing a mobile-friendly version of the site.
React and TypeScript developers, for the client-side web app.
Java and Micronaut developers, for high-performance server-side APIs.
Graphic artists who can develop SVG-based vector graphics.
Testers: We're seeking individuals with experience in software testing, particularly those familiar with web applications. This includes testing the site for functionality, usability, and compatibility across devices and browsers. Knowledge of automated testing would be advantageous.
At the moment, we are concentrating our efforts on two main areas: enhancing the user interface and completing crucial linguistic information.
Next-Generation Word-by-Word Prototype: The prototype focuses on revamping the highly visited word-by-word section of the Quranic Corpus. Our goal is to create a Quran reader that is familiar and intuitive, mirroring the layout of a physical Quran. This new prototype aims to offer quick access to word-by-word translation, roots, transliteration, and audio without compromising simplicity and responsiveness across various devices. This clean, user-friendly interface will facilitate the exploration of the rich data in the Quranic Corpus, catering to both seasoned users and newcomers alike. Drawing inspiration from eLearning platforms, we're striving to create an unparalleled, interactive platform for learning the Quran.
Grammar Diagram Completion: The Grammar Diagram Editor is a tool designed to aid in the completion of the remaining half of the corpus' grammar diagrams. Through a collaboration of our technical and linguistic teams, this work is of paramount importance as it supports the completion of our syntactic treebank, a crucial resource for understanding the Quran's grammatical structure. The new tool will be developed with linguists in mind, ensuring its ease of use and effectiveness in facilitating the completion of the treebank.