quran / quran_android

a quran reading application for android
http://android.quran.com
GNU General Public License v3.0
2k stars 886 forks source link

Color-coded tajweed rule support #439

Closed theimpostor closed 1 year ago

theimpostor commented 10 years ago

Adding an option to change the color of the font for certain letters based on tajweed rules would be a useful option, especially for beginners. This could help reduce errors in recitation, for example.

I understand this might be a lot of work. Where would one begin to try and implement this?

moustafa-thirdwayv commented 10 years ago

Salam Alikum,

I had the same inspiration for working on this feature (adding color coded tajweed rules) which I believe to be very helpful and useful.

I have looked at the issue lists at the project, and it seems that the tajweed rules were discussed earlier and according what I can understand from this post: https://github.com/quran/quran_android/issues/171

adding this feature is pending the availability of high quality images , right ?

There is already some apps on the market offering color-coded Tajweed mushaff like: https://play.google.com/store/apps/details?id=com.simppro.quran.tajweed

My initial thoughts about how to add this feature are either:

  1. Augment the sqlite database file of https://github.com/quran/quran.com-images with necessary data that represents the different color codes for individual letters. (For example adding a table that stores that in sura x, aya y, the letter z should be green) .
  2. Update the image generator script to make use of the new table to output the quran images with tajweed rules.

Alternatively, we can:

  1. Update the image generator script to parse the text content of each aya and output the color coded letters based on applying the tajweed rules on the fly while outputing the images.

The trade off here, is that the first approach is simpler to implement but requires more data entry.

Please let me know what do you think ?

ahmedre commented 10 years ago

jazakAllah khairan. the problem is that the quran.com-images project (which we are using) and many other font-based projects are word by word - meaning we know where each word is, but we don't know where each letter is (ie each glyph is a word, instead of each glyph being a letter, and each page has its own font instead of there being just one font for the entire mus7af).

i think there are some other alternatives - the Quran Corpus project, for example, uses the me_quran font, which is an actual font, to do its word coloring - but unfortunately, if you render a page with me_quran, it doesn't look the same as the uthmani mus7af.

our easiest alternative here is high quality images. we're working right now on supporting multiple image types, so insha'Allah hopefully once we can get some high quality scans of the tajweed mus7af, we can do this (with the caveat that the tajweed mus7af is typically under copyright).

Muhammed1111 commented 5 years ago

السلام عليكم ورحمة الله وبركاته،

I am willing to finance this project. It can benefit alot of Huffaaz. We need the different Qiraat color coded according to the latest version available. I think that the best option is a hiqh quality bookprinter. Once we have acquired one we can digitize many more Islamic books.

جزاكم الله خيرا

ج

moussa-b commented 5 years ago

Assalam ahlikoum,

Just for information, Dar Al Maarifa Quran's page images are available here : http://www.easyquran.com/ar/prepage-ar.htm

For Hafs with tajweed coloration : http://www.easyquran.com/quran-jpg/index.php For Warch with tajweed coloration : http://www.easyquran.com/quran-jpg-w/index.php

murtraja commented 5 years ago

@ahmedre I would like to work on this. Can you suggest where should I start?

Muhammed1111 commented 5 years ago

The thing is that the pages of the Quran on that website are not HD you can see its bad quality. I mean somebody did it on IOS with high quality images so I don't understand why the same couldn't be done for android devices. I am planning to switch to Apple if it doesn't happen on the android because it is a great help especially for women. Because they aren't always pure anyways if it is to be done I am willing to be part of the sponsors.

جزاكم الله خيرا

murtraja commented 5 years ago

image_info @Muhammed1111, isn't ~1080p enough for HD?

Muhammed1111 commented 5 years ago

It doesn't look HD to me tbh

Muhammed1111 commented 5 years ago

Idk how the pages on the Quran app were acquired however ما شاء الله لا حول ولا قوة إلا بالله they are very clear and sharp and bright.

ahmedre commented 5 years ago

can we get someone to get a good quality copy of the tajweed mushaf (with heavy pages that aren't transparent) and do a high quality scan of them instead?

ahmedre commented 5 years ago

fwiw, after some time, we started scaling down the images served on Quran apps because it didn't seem to make a massive difference (versus it made a massive difference in both space taken to download the images and in the time to switch between pages). so right now, the highest quality images are just over 1080p. these might be okay, but would need a bunch of work to make them work properly (i.e. removing the borders, make them transparent, etc).

Muhammed1111 commented 5 years ago

We probably have 3 options.

  1. As akh Ahmed suggested somebody needs to obtain a good quality of the tajweed mushaf with heavy pages and then perform a high quality scan
  2. We ask our brothers from the IOS app of a copy of the images of the tajweed Quran that daar al maarif provided them.
  3. We directly ask daar al maarif directly for high quality image scan performed on a good quality mushaf with heavy pages

NOTE: This app features the newest print from Daral Maarifah and unlike other apps, it is legally licensed. The print is of top most quality obtained directly from Daral Marifah. By purchasing this app, you can rest assured that you have bought an authentic Daral Maarifah product.

Introduction:

SHL info systems, in collaboration with Daral Maarifah, presents “ The most advanced Quran application ever made for a smart phone”.

This is a digital version of the popular color coded Mus'haf At-Tajweed published by Daral Maarifah. The new version has been written ground up aiming to provide excellent user experience and appealing design with actual book like interface. A number of new features has been added on top of what the old version of the app provided.

https://itunes.apple.com/us/app//id368876346?mt=8

ahmedre commented 5 years ago

sounds good - let's go with options 2 and 3 - we can email the iOS brothers and ask for a contact with Daral Maarifah and ask what we can do to get and use the images. since the app is paid, they may be asking them for a licensing fee of sorts, so we can see if we can work something out with them.

i found SHL info systems' email on their page, so i'll email and ask them and will update when i hear back in sha' Allah.

jazakAllah khairan

Muhammed1111 commented 5 years ago

جزاكم الله خيرا

moussa-b commented 5 years ago

If needed I have dar al maarifa images for which we have removed border, for hafs and warsh. If needed I can give it to you (by ftp or any other mean). Here is the final look of the image when border are removed. page005

murtraja commented 4 years ago

Also, if needed, I might be able to generate QA style (glyph_id, page_number, line_number, sura_number, ayah_number, position, min_x, max_x, min_y, max_y) database for these images. (The glyphs will be segments of the ayat that span a line and not the word)

mouhanad1053 commented 3 years ago

السلام عليكم ورحمة الله وبركاته،

I am willing to finance this project. It can benefit alot of Huffaaz. We need the different Qiraat color coded according to the latest version available. I think that the best option is a hiqh quality bookprinter. Once we have acquired one we can digitize many more Islamic books.

جزاكم الله خيرا

ج

بسم الله الرحمن الرحيم In the name of Allah, Most Gracious, Most Merciful

ASA,

After seeing this issue and with the help of Allah, I have created a platform which can support multiple versions of Quran. I currently have mushaf al-Madina and mushaf al-mujawwad and I recently added the pages from Qaloon-An-Nafi3 from islamweb.net. The platform allows for adding any books on-demand so we can have all the 10 Qiraat in 1 app, IA.

The consumer of this platform is currently an Android app. I am about 85% ready to go live. I have been working on my own but I am in need for some expert help. If there is still interest in contributing to and/or financing this project please contact me for details. I have gained a lot by learning from QuranAndroid and others like quran.com.

Based on https://github.com/quran/tajweed I have also set up a quick web page with all the ayas generated in png with colored tajweed codes (still need to add some tajweed rules and regenerate). it can be viewed at https://indie-publishing.space/main/publishers/Assala/Quran/show-sura.php. If interested, you can find my contact on that page as well.

جزاكم الله خيرا

HiIAmMoot commented 2 years ago

Also, if needed, I might be able to generate QA style (glyph_id, page_number, line_number, sura_number, ayah_number, position, min_x, max_x, min_y, max_y) database for these images. (The glyphs will be segments of the ayat that span a line and not the word)

I have enhanced and added transparency to the tajweed pages. They should be high quality enough for the usage of the app. I can also provide cropped if need be. FIlesize is big though (9GB) so it should be sized down to a common size like 1920 or 1260 (whatever the app uses). I am also working on expanding the code to support a tajweed page type but will need help with the database etc. Could you help me with that?

The enhanced images you can find here

murtraja commented 2 years ago

Recently, I got the opportunity to segment the Tajweed pages word by word. Since I have the code ready (might have to modify a bit here and there) I should be able to help you with minimal efforts.

https://user-images.githubusercontent.com/7806954/161809018-f4748571-94ab-4d8d-804e-6a78ddd85f52.mp4

mouhanad1053 commented 2 years ago

I have the Quran almujawwad at https://www.indie-publishing.space/self_publish_app/#/ Please check that put and let's discuss where we need to improve

On Tue, Apr 5, 2022, 12:46 PM HiIAmMoot @.***> wrote:

Also, if needed, I might be able to generate QA style (glyph_id, page_number, line_number, sura_number, ayah_number, position, min_x, max_x, min_y, max_y) database for these images. (The glyphs will be segments of the ayat that span a line and not the word)

I have enhanced and added transparency to the tajweed pages. They should be high quality enough for the usage of the app. I can also provide cropped if need be. FIlesize is big though (9GB) so it should be sized down to a common size like 1920 or 1260 (whatever the app uses). I am also working on expanding the code to support a tajweed page type but will need help with the database etc. Could you help me with that?

The enhanced images you can find here https://github.com/HiIAmMoot/quran-android-tajweed-page-provider/tree/images-enhanced/images

— Reply to this email directly, view it on GitHub https://github.com/quran/quran_android/issues/439#issuecomment-1089021891, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUPWLD53NLIEJL3O75APAC3VDRU7HANCNFSM4ASG7SDA . You are receiving this because you commented.Message ID: @.***>

HiIAmMoot commented 2 years ago

Recently, I got the opportunity to segment the Tajweed pages word by word. Since I have the code ready (might have to modify a bit here and there) I should be able to help you with minimal efforts.

output_file.mp4

Amazing! I should be able to get this into the app in no-time. What resolution should I scale the images? I've also fixed the problem of the first 2 pages by cropping them to match the resolution of the rest, so there might be no need to take measures for that anymore.

murtraja commented 2 years ago

I don't have a clear answer to that as depending on the device, we download different resolution images. The resolution for Tajweed pages that I have worked with is 1053 x 776 (ratio: 1.357) and the pages which you have shared are 4208 x 3104 (ratio: 1.356)

The resolution that I have worked with mostly in Quran Android is 1656 x 1024 (ratio: 1.617). These are cropped images, so the ratio is greater than Tajweed.

I leave this upto you to decide. Having said that, since we are finding the word segmentation and preparing a database, it really doesn't matter right now as even if we choose say H x W and then in the future change it to H' x W', as long as the ratio remains the same, the entire database can be scaled up or down and the coordinates will be mapped correctly.

For the sake of processing, I would like these images to be somewhere near 1053 x 776 as I don't have a lot of computing power to tackle high res images. Once we have the db ready, you can decide on a good resolution and scale the db accordingly

Try to play with the images with different resolutions and styles here: https://quran-android-images-helper.herokuapp.com/?images=1024 https://quran-android-images-helper.herokuapp.com/

try changing images param to 1920, 1260, you can also add safah=3 param to open a specific page

source: https://github.com/murtraja/quran-android-images-helper

HiIAmMoot commented 2 years ago

I will aim to keep the ratio consistent, so i will resize the pages to be 2106x1552 (2x scale) or 3159x2328 (3x scale). That means that for processing you can keep the same images you are using now and add a multiplier as you mentioned.

I will do scaling at the final step as I prefer working hi-res when making edits. Am working on isolating the black texts so i can invert it for dark theme.

Also, if you need processing power. I can always help out, I have 2 fairly specced laptops that can run things.l, or you can use https://colab.research.google.com/ for cloud processing(is what I used for cleaning and upscaling the images). It really helps!

ahmedre commented 2 years ago

@HiIAmMoot masha'Allah amazing work! what is the source of these images, and are we confident that the processing did not inadvertently remove / modify the text (i.e. sometimes when running processing, it causes some of the tashkeel to hide, etc). this will make life a lot easier in sha' Allah for adding things. previously, we were using ayah-detection for detecting ayah marker locations and generating the database, but what Murtaja shared seems more promising.

@murtraja sub7anAllah amazing - I wanted to write a tool to do exactly what you have there, because I wrote some code to use Google's OCR and some custom code to try to find the word locations in the newer madani mushafs, but realized that it would need manual editing to finish the job. is there a way you can share this tool for use and/or open source it? also, am curious how you got the initial word bounds in the first place.

jazakumAllah khairan both for this amazing work!

mouhanad1053 commented 2 years ago

@HiIAmMoot masha'Allah amazing work! what is the source of these images, and are we confident that the processing did not inadvertently remove / modify the text (i.e. sometimes when running processing, it causes some of the tashkeel to hide, etc). this will make life a lot easier in sha' Allah for adding things. previously, we were using ayah-detection for detecting ayah marker locations and generating the database, but what Murtaja shared seems more promising.

@murtraja sub7anAllah amazing - I wanted to write a tool to do exactly what you have there, because I wrote some code to use Google's OCR and some custom code to try to find the word locations in the newer madani mushafs, but realized that it would need manual editing to finish the job. is there a way you can share this tool for use and/or open source it? also, am curious how you got the initial word bounds in the first place.

jazakumAllah khairan both for this amazing work!

I have generated the Aya location databases for both mushafs Almadina and Alumujawwad. However I didn't crop the page and kept the margins. I should be able to use the cropped pages and re-generate the db, if that helps here. Ramadan Kareem everyone.

HiIAmMoot commented 2 years ago

BarakAllahu feek. I currently have no means of checking if any information was lost. However, the algorithms used are ARCNN which just removes jpg artifacts, and ESRGAN which upscales images. It could've happened that ESRGAN smoothed things out, however I chose a model that isn't as agressive so details should be preserved and it only sharpened the text. @murtaja 's tool can seemingly match the images with actual Arabic text. So I could maybe do validation run and see if there's any inconsistencies that way.

Here's a closeup comparison of some text. comparison

murtraja commented 2 years ago

I will definitely open source the tool after some refactoring, but let me summarize how it works. There are two parts to it. The first one is a stupid and simple algorithm to segment the words and generate a confidence score. The second part is the "interactive word corrector" which I have shared.

The algorithm is so simple, you will not believe me. I start from the right, start of the line. Let's call this x position as my cursor. From the cursor, I jump to cursor - word_width where word_width is just a guess of how long the next word is going to be. This guess is calculated by using the Quran Android Images repo. I "draw" the word by using the fonts present in that repo and simply measure the width.

Now I fine-tune the word boundary by looking at the empty pixels surrounding the jump (call this search vicinity). If I find a "wall" of empty pixels, I am good and I move on to the next word. The problem occurs in cases where the next word starts with kaaf as then my current word and the word starting with kaaf overlap. In this case, where I am not able to find the "wall" of empty pixels, I resort to pixel-by-pixel template matching. Since I had previously "drawn" the word using the fonts, I just compare the "drawn" word with the current word and expand or contract the jump boundary depending on the score of the template matching algorithm. step1

step2

Using this approach I was able to make a good 75 to 80% progress. I already knew what words are present on the individual lines thanks to the database schema diagram present in the repo. Then for the remaining 20% words, I created the interactive word corrector.

@HiIAmMoot is it possible if you can downscale the images and provide a link? I will then start processing the images. Also, in the meantime, let me share the Tajweed db with you after I scale it to the resolution which you have given. If the pages/fonts are identical, it should work out of the box without having to run the algorithm.

ahmedre commented 2 years ago

@HiIAmMoot jazakumAllah khairan, this looks good! we can in sha' Allah see about getting manual review for the pages to be sure before releasing.

@murtraja masha'Allah, this is great! maybe we can combine the approaches (Google OCR first, falling back to your technique of estimating) and that should give us better estimates in sha' Allah. please keep me posted about this!

HiIAmMoot commented 2 years ago

@ahmedre yes it is best to do a manual review. If you see patterns of inconsistencies then I might have to edit the postprocessing. For example, I just noticed that the dammas on the letter ه when they're greyed out, it becomes hard to read due to its small size. I have made the gray letters a bit more obvious so in sha Allah that is fixed. If there's still issues with this, there's ways to fix it if I can detect on what page and where this happens, @murtraja could probably help with that if the time comes.

There's also the issue of filesize. The 1552x2106 pages are 3.25GB in total. I am looking into optimization, what is acceptable in terms of filesize?

@murtraja you can find 2106 resolution pages here. The image ratio is corrected so whatever work you've done so far you can keep if you upscale it 2x.

Furthermore, I have means on extracting black text and inverting it to make a dark-themed mushaf tajweed. I'd need feedback on this though. This would require a second image layered on top of the base tajweed image however. Still figuring out how to do the borders though. image

Finally, I have a found a source that provides cleaner and higher resolution images. But it lacks the side area. What do you both think? Should we adopt this one instead? I believe my algorithms will produce much better results with this.

image2

ahmedre commented 2 years ago

I'd vote without the side area (because the mobile screen is already small, and the words described on the side can be read by downloading the "Arabic Word Meanings" tafseer present in the app). For the size, I'd go smaller - a width of 1280 or 1260, and then apply very heavy compression - I think the largest image set we have in the app today is probably less than 150mb, so our target would be there.

With respect to dark theme, I'd vote just for transparent images - with that we can apply a filter (which is what we do today) to the text to render dark theme.

Also, for saving size, I'd cut the border and the description of the colors - we can find a way to add a single image somewhere describing what each color is, but paying this number of pixels per page will make a large difference. Because these pages are likely uniform (or at least the odd ones and even ones are probably so), we can write a command line script to pretty easily just keep the piece in the middle (using mogrify (part of ImageMagick) with a -crop parameter). We did this before for another page type before.

jazakumAllah khairan!

HiIAmMoot commented 2 years ago

I have means to get rid of the borders and provide just the text. I will provide new images soon then, in sha Allah.

HiIAmMoot commented 2 years ago

Ok I got it cropped. Still working on enhancements though. Got it here image

benomaire commented 2 years ago

@HiIAmMoot

  1. Your first upscale used this source. I want to ask what is the source used for the recent cropping, and did you upscale then downscale?
  2. 857*1262 is small IMHO.
  3. Do you think the process of inverting just the black text can be implemented in the app? because using non-transparent two sets of images will force two backgrounds: white or black. Not to mention doubling the size.
ahmedre commented 2 years ago
  1. yes, want a width of 12xx and a height accordingly.
  2. we have some scripts in ayah-detection to transparent the images so we can programmatically invert the text like we do for the other image types.
HiIAmMoot commented 2 years ago

@HiIAmMoot

  1. Your first upscale used this source. I want to ask what is the source used for the recent cropping, and did you upscale then downscale?
  2. 857*1262 is small IMHO.
  3. Do you think the process of inverting just the black text can be implemented in the app? because using non-transparent two sets of images will force two backgrounds: white or black. Not to mention doubling the size.
  1. The latest crops are NOT upscaled yet. I am still working on that because it requires lots of processing time. I got the pages from easyquran-eg.com. Here is a direct link to the page.

  2. My target resolution is to make the smallest dimension 1260. So the final resolution will be somewhere around 1260x1855.

  3. We can use transparent image and a separate mask for the black text. @ahmedre the issue with using detection is we need to prevent the colored text from inverting. The dark theme example I posted was achieved with a mask like this. image These will roughly be a third of the original filesize. So adding support through this mask will have a 33% increase in size instead of doubled(by my estimate).

I believe however that we can optimize further by having a color mask on top of the original text. So making the base layer text all black so we can use ayah detection to invert, then putting the colored layer on top of it.

ahmedre commented 2 years ago

if the images are transparent, you don't need to do anything to support it - we can mask it in the code (which is what we do today) - you can see the code here.

HiIAmMoot commented 2 years ago

if the images are transparent, you don't need to do anything to support it - we can mask it in the code (which is what we do today) - you can see the code here.

I see, then we will do that.

benomaire commented 2 years ago

I asked for the new dark approach because I have seen someone complaining that colors in dark mode are not easy on the eyes, but that could be because of the solid black background.

1 سبتمبر 2021 4 تطبيقٌ معشر المغاربة كنا في حاجة إليه.. لكن ليتكم تحذفون الألوان من الحروف، فإنها في الوضعية الليلية تكون مؤذية للعين

The current approach is definitely easier but it flips the colors (e.g. the green becomes red and vice versa). I'm sure you are aware of this but mentioning it to make sure everybody is on the same page. I mean there will be a compromise either way.

HiIAmMoot commented 2 years ago

Yeah if this can be remedied with the new approach then all is well in sha Allah.

Another question, we are now communicating purely through git issues, is there an alternate means of communications you adopt? Like a discord or something?

ahmedre commented 2 years ago

we are on the quran.com Discord - you can join here - https://quran-community.herokuapp.com

HiIAmMoot commented 2 years ago

@ahmedre new set of enhanced images are done. Got a resolution of 1280x1890 (Downscaled from 3424x5056). You can find em here. I've also tried a 256-color version of the images to reduce filesize, those are around your benchmark of 150 MB, but some quality is sacrafised. You'll have to check yourself which you prefer. I believe I can find some middle ground but if 150MB is your goal then those are the closest you'll get I think.

@murtraja is this sufficient for you to do your thing? The ratio is different from the previous set but it shouldn't change further.

In sha Allah we can now proceed to manual clean-up and checking the pages for errors.

image

ahmedre commented 2 years ago

masha'Allah, awesome! I did a quick look and the originals can still be reduced some (~11%ish via ImageOptim), and ~6% for the 256 images - the two look very similar to me, so I think we'd need to go with the 256 ones since the size will be prohibitive otherwise.

if Murtaja's work doesn't translate to this, we can use ayah-detection to detect the ayah markers to unblock this.

HiIAmMoot commented 2 years ago

Khayr in sha Allah. I will see if I can run it through ImageOptim later today in sha Allah.

murtraja commented 2 years ago

@HiIAmMoot I think the following operations will fix the geometry

  1. Perform origin shift (translation) to account for the trimmed borders in the 776 x 1053 images and convert them to 616 x 905 (In other words, subtract left border size from x coordinates and subtract top border size from y coordinates)
  2. Scale x coordinates by 1280/616 = 2.0779 and y coordinates by 1890/905 = 2.0884

Also I realized that the db that I created has only word segmentation entries - it doesn't have information regarding "hizb", "ayat end marker" or "sajda" glyphs. What this means is that the user will not able able to interact with these glyphs in order to perform ayat specific operations. So I will have to add support for this.

Moreover, in this comment you talk about using a CNN to remove jpeg artifacts. I realized that there are clean .PNG files available here: https://app.quranflash.com/book/Tajweed/epub/EPUB/imgs/003.png (You might have to automate downloading all 604 images, though). These files are 776 x 1053. If you have any concern that the CNN doesn't undo the effects of DCT / JPEG compression, maybe try these as your starting images?

HiIAmMoot commented 2 years ago

@murtraja the newest source I've used doesn't have much jpeg artifacts and js a higher resolution. So I think I stick with that ine because it really helps with the smaller dhammas. However, it was good to run it through CNN regardless because that made the upscale cleaner. I can show a comparison between running ESRGAN on the raw and the ARCNN'd image if you're curious.

To clarify what I've done so you know how I've come to the final resolution:

mouhanad1053 commented 2 years ago

masha'Allah, awesome! I did a quick look and the originals can still be reduced some (~11%ish via ImageOptim), and ~6% for the 256 images - the two look very similar to me, so I think we'd need to go with the 256 ones since the size will be prohibitive otherwise.

if Murtaja's work doesn't translate to this, we can use ayah-detection to detect the ayah markers to unblock this.

I have used ayah-detection to generate a .json index of the ayahs' marker coordinates, found here. The .json includes the number of Ayahs which were detected. I did a visual check with the images as shown below. As an additional check, a script could be written to cross check the number of Ayahs for each page to verify that produced number of detected ayah markers match actual number of ayahs. I hope this helps. 593

HiIAmMoot commented 2 years ago

Ok ok, subhan Allah. Did something happen in the past week or something..?

Easyquran.com now provides high quality pages!

image

@ahmedre should I process these instead?

ahmedre commented 2 years ago

wow, masha'Allah! yes, if we can avoid our image processing steps, then in sha' Allah we won't have to do manual review.

@mouhanad1053 jazakumAllah khairan! if we're re-doing the images though might need another one in sha' Allah

HiIAmMoot commented 2 years ago

@murtraja @ahmedre new images are up! @mouhanad1053 Find em here

In sha Allah these dimensions (1280x1883) are final!

ahmedre commented 2 years ago

what's the overall size? afraid this will still be too big, especially since ~1280ish is a pretty common width, would aim to have that as the width if possible? what do you think?