islamic-network / api.alquran.cloud

The AlQuran.Cloud API - https://alquran.cloud/api
GNU General Public License v3.0
229 stars 42 forks source link

Fix Incorrect Surah Name Characters #21

Closed meezaan closed 3 years ago

meezaan commented 5 years ago

quran-uthmani edition: سبإ must be سبأ Second one, النبإ must be النبأ

vipafattal commented 4 years ago

A Question in this field why only Surat Al-Fatih has Arabic punctuation on its name.

meezaan commented 4 years ago

I started adding these but never got around to completing the task. So in due course they will all have them, God willing!

Riz-waan commented 4 years ago

@meezaan I can help if you would like, Inshallah

meezaan commented 4 years ago

@Riz-waan Al Salaamu Alaykum. Thank you for the offer to help out.

I'm attaching a word document where I'd made a start in little bits (but it all needs to be completed and then reviewed, particularly with the last vowel on each name).

Thank you.

names.docx

Riz-waan commented 4 years ago

Oh okay sorry, I meant it change in code. I am not knowledgeable in Quranic Arabic @meezaan

fawazahmed0 commented 4 years ago

@Riz-waan Your intention is sufficient to earn you a good reward :slightly_smiling_face:

@meezaan Salam meezaan, thanks for your work, may Allah reward you with good, and bless us and protect us from evil :+1: Will this work out for you? I fetched those arabic surah names from kfcq quran, but I not sure though whether they are correct or not.

meezaan commented 4 years ago

@Riz-waan Indeed, thank you for the effort.

@fawazahmed0 Thank you. I see that is missing some of the Tashkeel, so insha Allah I will continue to work on that word document as and when I get time.

fawazahmed0 commented 4 years ago

@meezaan could you give an example? which surah name is missing the diacritics and what would be it's correct form, a single example would be sufficient, I too am interested in fixing it in my file

Thanks

meezaan commented 4 years ago

@fawazahmed0 According to the rules of Classical Arabic, the wasala is always missing, along with the following sukoon, for instance:

النَّاسِ should be ٱلْنَّاسِ

The endings all seem to follow the idafa pattern, so as long as they all appear with the word سُوْرَةُ before them, they would have the correct grammar.

I just need to be sure about surah's that are named after prophets, for instance. Noohin vs Yusufa, I am not sure of the rules applied here because they are names, so these are the ones I really need to check.

Hope this helps, insha Allah!

fawazahmed0 commented 4 years ago

@meezaan Yeah, I actually stripped the سُوْرَةُ thing, assuming it just means chapter, didn't know it would effect the grammar. I will add that back Inshallah in the file

Coming to surah names your aren't sure about, the Standard Quran hafs from Quran Complex, seems to follow that pattern, For example: nooh1.pdf yousuf1.pdf naas1.pdf

And also if you want the source from where I got the surah name , here are the files: UthmanicHafs1 Ver14.docx HafsNastaleeq Ver10.docx

I actually preferred Nastaleeq variant over uthamanic to fetch surah names as it will have more diacritics on it and it will help non-arabic speakers. For example: Nastaleeq -> سُوْرَةُ Uthamanic -> سُورَةُ Nastaleeq has sukoon over و and Uthamanic doesn't have one

And Yeah thanks for explaining سُوْرَةُ thing, because arabic is not my language. :smile:

meezaan commented 4 years ago

Thank you for this @fawazahmed0.

I think (I could be wrong) the removal of the sukoon is actually not Uthmanic - it's laziness, because the sukoon is pure Arabic Grammar, and unfortunately many Arabs do not care for exact pronunciation or meaning (and the production of thousands of classical books without tashkeel alludes to that).

I will, insha Allah update the names soon now.

All the Uthmani text also has one consistent error that I've seen in every printed Qur'an coming from Saudi Arabia. That same error also exists in the edition used on the AlQuran API. The little 0 on top of أَنَا۠ in https://alquran.cloud/ayah/2174 instead of an actual sukoon, for instance.

Does this exist in your API?

fawazahmed0 commented 4 years ago

@meezaan Yes it does exist in my API as well, I was actually amazed to see a 0 in quran, even the Standard quran hafs seems to have that thing. Anyways I dug little deeper into this by the mercy of Allah, and it seems that the 0(unicode value 06E0) is part of arabic unicode script and should look something like this(i.e rectangular dot). So in the ends it's a fonts issue, all the arabic fonts seems to show it as english zero and most of the general fonts such as Time New Roman, Arial etc seems to show it properly(i.e rectangular dot or arabic zero)(you can try Arial fonts to test that).

It seems that the rectangular dot is only specific to quran and only used at 66 places before نَ that's why the font's authors never actually caught that issue. I am not sure what is the use of that rectangular dot, but looking at here, it seems to come under category of non-letter making harakah i.e it is similar to small ج in هُوَۚ or small س in وَیَبۡصُۜطُ

I think after confirming(i.e making 100% sure) what I am saying is true, we should raise issue at popular arabic fonts repos(Google arabic noto, khaled hosny's font's, kfcq fonts etc), so that they can fix it.

Thanks

meezaan commented 4 years ago

@fawazahmed0 I don't believe that is an Arabic zero - according to the rules of Classical Arabic, it is a sukoon. It appears as the arabic zero in fonts like scherazade, so the zero is not the problem here. I think it is simply mistyped, and carried from version to version (and I know this is a bold assertion, but in the digital age all things manifest themselves, particularly when tied to man's attention to detail).

What it actually needs to be is a sukoon, so really, what we see and should see is evident from the screenshots. As you have rightly mentioned, it appears when the نا appear together, so it will not be, insha Allah, hard to correct.

What it appears like with the zero:

incorrect_zero

The correct version with a sukoon:

correct_sukoon

fawazahmed0 commented 4 years ago

Thanks @meezaan for correcting me , if you are 100% sure about this, then it's easier to correct, we just have to replace the english zero thing with the sukoon, it will take 5 seconds to do that. I will close this issue then. But we will have to raise issue at quran-academy and khaled hosny quran repos then, so that they can also correct it, quran complex doesn't reply back so I will not be wasting my time writing an email to them to correct the mistake.

Also, do you want the corrected version?

Thanks

fawazahmed0 commented 4 years ago

@meezaan see this

khaledhosny commented 4 years ago

I think (I could be wrong) the removal of the sukoon is actually not Uthmanic - it's laziness

Long vowel letters don’t get a mark in standard Arabic, سوْرة would be pronounced as consonant و not as a long vowel (like in سوْءة).

meezaan commented 4 years ago

@khaledhosny Yes, but they do in Classical Arabic, don't they? I have often seen this described as archaic Arabic, but I think that definition is besides the point.

Insha Allah once we unpack - I recently moved - I will dig out some of the old Grammar books with examples just to be sure.

meezaan commented 4 years ago

I just want to add this here in case anyone sees this and gets concerned.

This discussion around the zero or a sukoon is purely syntactic about haraka (vowels). At least in the case of the word أَنَا۠, neither the meaning nor the pronunciation would change.

@fawazahmed0 Thank you for raising this and @khaledhosny thank you for sharing the images of the mushaf. These are, however, printed (and as such are prone to errors just as much), so I would look for an actual hand written version before I would consider making the change. I do have a hand traced version of an Ottoman Qur'an, and that, like the sub-continent version, simply contains the fathah (because pronunciation, of course is key for the non-native Arabic speaker and the meaning does not change with this), so that has not been helpful in this matter.

I will also check with someone I know who may be qualified to answer this question beyond a book of Classical Arabic Grammar.

Either way, insha Allah, I will share what I am able to find out.

And God knows best!

meezaan commented 4 years ago

See https://www.abouttajweed.com/080702.htm. This zero might be specific to medina mushaf (or other similar mushaf). I've only found one hand written mushaf online, and that too does not have a zero or a sukoon. So far, it seems like it is related specifically to tajweed, especially if you want to stop at the word in question.

Hand_Written_Mushaf_1260AH

khaledhosny commented 4 years ago

The Azhar mushaf is the famous 1924 Cairo edition and is basically the standard and the most authoritative source for printed masahef. The committee that oversaw it standardized and even invented the symbols used in masahef today. The other two masahef are the most widely circulated masahef, and they basically verbatim copy the Azhar mushaf.

Any digital Quranic data need to reflect this, as that is what is in the masahef people read today, not any handwritten mushaf. These symbols are notation systems invented to help reading the Quran, there is no right or wrong but what is familiar to mushaf readers today.

I think the source of the confusion here is that masahef in the Indian subcontinent use a different notation system. This needs to be reflected in digital form as well, but this requires a different set of data since the differences as much more than the shape of this circle (the same goes for masahef used in Maghreb, on top of beaning based on Warsh reading as well).

fawazahmed0 commented 4 years ago

Just wanted to add that the different mushufs have already been digitalize by Quran Complex.

Here are the different mushufs(such as IndoPak/Nastaleeq, Warsh etc) and their relevant fonts: Mushafs with Fonts.zip

I never added all those in my API(only added nastaleeq and Uthmanic), because most of them have mistakes, not in the actual quran, but in the verse numbering, for example in Uthmanic Warsh, surah baqarah is of 285 verses, but in reality baqarah has to be 286 verses, the mistake is in verse no 1, they have merge verse 1 with verse no 2. And there are many more mistakes in it, I did wrote about this to them, but they never replied back.

Thanks

khaledhosny commented 4 years ago

I’d be cautions before flagging something as a mistake without having proper expertise in the subject. There are different methods for counting verses, and the Baqarah is indeed 285 verse in Warsh masahef.

khaledhosny commented 4 years ago

IMG_20201006_055816_1 IMG_20201006_055758_2

meezaan commented 4 years ago

@khaledhosny Thank you very much for the clarification.

So, just to re-iterate, there no error as far the zero is concerned. It's a tajweed notation.

Al Azhar is Al Azhar, so there is no debate about. However, I would just like to illustrate what you are saying with images, because this is not so much about the reading, but more about the way it is denoted. I am sure the choice was made for valid reasons, but as someone who has studied Grammar not tajweed, it can (and this can very likely just be me and my ignorance) cause confusion, because the little 0 is not part of any Classical Arabic Grammar text that I have studied (and of course I have not studied them all, which is why the discussion).

The Turks do a much better job of this, by simply adding a completely different colour for all tajweed markings. This, in essence, separates them from the text. As you mention, there needs to be some digital data or annotation to reflect the markings in the Cairo edition. But, solely on principle, it is not accurate to call the Cairo Edition Qur'an in unicode, but to call it the Qur'an Cairo edition in unicode. There are other differences too, as you mention, for instance, like a missing hamza above the alif in the أَنَا and just a fatha instead (this is also visible in the Turkish image). Of course, the pronunciation does not change, but even books of Arabic Grammar that pre-date the Cairo edition state this as being the 'correct' Arabic grammatically.

Anyhow, the back of every Qur'an from the Azhar mushaf has this key (but obviously not our digital copies):

tajweed_key

The Turkish Qur'an denotes tajweed in completely different colour:

Turkish_mushaf

And their key:

tajweed_key_turkish

And God knows best.

meezaan commented 4 years ago

as that is what is in the masahef people read today, not any handwritten mushaf.

Irrespective of what people read, a principle still stands and needs to be mentioned irrespective of what people understand.

People's understanding and knowledge might become less over time, but that shouldn't ever be a reason to reset a principle.

khaledhosny commented 4 years ago

because the little 0 is not part of any Classical Arabic Grammar text that I have studied

Of course it is not, they invented the symbol specifically to be used here. Most Quranic marks are used exclusively with the Quranic text and not for classical or standard Arabic text, so I don’t see why this is an issue.

by simply adding a completely different colour for all tajweed markings

Al-Azhar committee intentionally avoided any use of color because it was difficult to print back then, this ship has long sailed and millions of Muslims are familiar with these symbols and changing them is not doing any good.

Irrespective of what people read, a principle still stands and needs to be mentioned irrespective of what people understand.

I really don’t know what you are arguing about and what principles need to be upheld here.

meezaan commented 4 years ago

@khaledhosny It's not an argument. My apologies if that is what it comes off like, and I'm not saying anyone has done anything wrong (who am I to do that).

Classical Arabic does matter for those who study it to understand what the Qur'an means. You read the text and see something that is unfamiliar, and so it can be cause for ambiguity. And this stands true even if you have memorised much of it.

Regarding the principle, I'm simply referring to not being clear about the marks that have been invented for tajweed and that it be stated as such in digital editions. As you say, the marks for exclusively for the Cairo edition text and those that copied it, so we should just state that instead of stating they are for Qur'anic text, because that implies they are for all Qur'anic text. Anyone looking at another edition might think it is missing something. Once again, the point is just to avoid ambiguity.

A ship sailing or not sailing is not the point and I'm not pointing a finger at you, and if it seemed like I was, then I apologise again.

I need to just add this somewhere to the API and app (and whether anyone else chooses to do it where they have digital Qur'an text or not does not matter), that's all. For instance, I might be able to colour code these markings and add a footnote. This has already been done for many of the other markings (it was already part of the Global Quran database), but the little 0 is missing from that list, so that might be a good place to start.

fawazahmed0 commented 4 years ago

I’d be cautions before flagging something as a mistake without having proper expertise in the subject. There are different methods for counting verses, and the Baqarah is indeed 285 verse in Warsh masahef.

@khaledhosny , Thanks for correcting me again, it was something new to learn, I never knew about this, God willing I will add those variants also in my api. but I still have to convert the verse numbering into the uthamanic one, otherwise everything(API, apps etc) will break.

@meezaan I will inshallah share the different variants here, with uthamanic verse numbering, you could also add those in your API as well. :thumbsup:

I pray to God, that this discussion be not a cause of division , but rather a way to unite and benefit the muslim ummah. And inshallah when we will see the reward for this work on the day of judgement, it is going to be huge inshallah.

Thanks

fawazahmed0 commented 4 years ago

Here is the Quran variants with uthmanic verse numbering, this was made possible only by the mercy of God, and also thank you everyone for all your help: Quran Variants updated 15-10-20.zip

Note: The Nastaleeq/IndoPak variant doesn't follow the unicode standard properly, here is the IndoPak variant with unicode standard: ara-quranindopakuni.txt

Use Google Noto Arabic fonts to view it properly

Thanks

meezaan commented 4 years ago

I pray to God, that this discussion be not a cause of division , but rather a way to unite and benefit the muslim ummah. And inshallah when we will see the reward for this work on the day of judgement, it is going to be huge inshallah.

@fawazahmed0 Aameen!

Here is the Quran variants with uthmanic verse numbering, this was made possible only by the mercy of God

JazakAllahu khairun! Thank you for all the discussion, engagement and contribution. May Allah ﷻ be pleased with it!

fawazahmed0 commented 3 years ago

In the name of God who have guided me to do this work and I seek refuge in him from every evil that could harm me

Salam alaikum,

I have sent emails to few developers about my Quran related projects, and I think it will be beneficial if I share those here too:

You are free to use my work/code etc, without giving me any attribution, my reward is from my God.

May God accept our worship

meezaan commented 3 years ago

Updated.