Open Youssefares opened 5 years ago
We'll work on the words in the following way @Youssefares 1-500 @TarekAlQaddy 501-1000 @yara11 1001-1500 @miralelnahas 1501-2000
Each of us has his own space inside of the file so we can work in parallel, without losing the words order. When we're done we'll remove the boundaries and comments.
Using the freely available book A Frequency Dictionary of Arabic, we're going to seed our dictionary with the most common arabic words.
The dictionary has 5000 words. We'll ignore some of the words which are dialect specific such as كويس, أيوا, اللي etc.
The first milestone will be to go through the top 2000 words of the book, adding the فصحى ones to this file. We only need the word string and its part of speech for now, so ignore all the other information in the book. (plurals, other forms, etc.)
Notes:
We will copy the abbreviation into our seed file for now and we'll figure out the mapping from these abbreviations into arabic part of speech later, so an example row in the file will look like this: https://github.com/Youssefares/egsl-website-api/blob/master/db/words.csv#L9-L14