Closed mustafa0x closed 1 year ago
Is it available as individual txt/json/xml file? or does it have to be scraped from the website? Irab Mufassil was parsed from html files.
Not, not available, but I can reach out and see whether they're interested in providing.
@mustafa0x. We will never say no to more data! 😃
The treebank currently covers 50% of the Quran and we have used al-i’rāb al-mufassal as the main reference work for this. I think it would be an excellent idea to have an additional reference work, and we could compare both. For attribution, transparency, and to ensure we can trace back to the original author, it would be great to know the primary source for this additional linguistic analysis.
If you are able to reach them, it would be good to confirm the original publication so we can cite this, as reliable citations are key to the project. Even better would be an extract of their database. As you rightly point out, having syntactic roles at word level is invaluable.
Another reason to know the original source, beyond just reliable citations, is to ensure compliance with copyright and fair use.
Having the AI read both reference works could really speed up completion of the treebank, as you rightly point out.
Al-i’rāb al-mufassal is good, but surahapp is word by word and quite thorough, so I assume will be a lot easier to parse. They also have complete sarf.
Sarf
https://web.surahapp.com/ar/quran?surah=2&view=reading&page=3&word=9&aya=7&use-quran-app=true&tab=6&filter=articulation&content-info-model=false&stats-type=cols&aya-counting-key=adad_ayat-sowar_fn&keep-word-change=true&d-aya=true
iraab
https://web.surahapp.com/ar/quran?surah=2&view=reading&page=3&word=9&aya=7&use-quran-app=true&tab=6&filter=tasreef&content-info-model=false&stats-type=cols&aya-counting-key=adad_ayat-sowar_fn&keep-word-change=true&d-aya=true