linuxscout / arramooz-pysqlite

Arabic Dictionary for Morphological analysis - python + sqlite
GNU General Public License v3.0
11 stars 3 forks source link

Add Support for showing results #3

Open Kentoseth opened 3 years ago

Kentoseth commented 3 years ago

Salam,

In this lib, the output currently looks like:

[{'think_trans': 1, 'passive': 0, 'confirmed': 0, 'vocalized': u'اِسْتَقَلَّ', 'stamped': u'ستقل', 'future_moode': 0, 'triliteral': 0, 'future': 0, 'unthink_trans': 0, 'past': 0, 'unvocalized': u'استقل', 'future_type': u'َ', 'double_trans': 0, 'normalized': u'استقل', 'reflexive_trans': 0, 'imperative': 0, 'transitive': 1, 'root': u'قلل', 'id': 7495},

Instead of having just the 0/1(True/False), can you display the results. For example the verb 'ضَرَب', can you display the outputs of the past/future/imperative/passive

Another example for a noun 'كتاب', to show the output of the single/dual/plural/broken-plural

Your other library https://github.com/linuxscout/alyahmor generates the verb/noun forms already, but it doesn't show the past/future/imperative for verbs and it doesn't show some of the noun options as well.

If this feature request is more suited to https://github.com/linuxscout/alyahmor , please add it there instead of this library.

linuxscout commented 3 years ago

ok, I think it's a great idea, I will work on it.

linuxscout commented 3 years ago

Salam, I added a new feature to Alyahmor library, to generate tags for a word form

I added a option to get more details:

generator.generate_forms( word, word_type="noun", indexed=True, details=True)

for example:

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"كِتِاب"
noun_forms = generator.generate_forms( word, word_type="noun", indexed=True, details=True)
>>> noun_forms
  [{'vocolized': 'استعمل', 'semi-vocalized': 'استعمل', 'segmented': '-استعمل--', 'tags': '::'}, 
  {'vocolized': 'استعملي', 'semi-vocalized': 'استعملي', 'segmented': '-استعمل--ي', 'tags': ':مضاف:'},
  {'vocolized': 'استعملِي', 'semi-vocalized': 'استعملِي', 'segmented': '-استعمل--ي', 'tags': ':مضاف:'},
  {'vocolized': 'استعملكِ', 'semi-vocalized': 'استعملكِ', 'segmented': '-استعمل--ك', 'tags': ':مضاف:'}, 
  {'vocolized': 'استعملكَ', 'semi-vocalized': 'استعملكَ', 'segmented': '-استعمل--ك', 'tags': ':مضاف:'},
   {'vocolized': 'استعملكِ', 'semi-vocalized': 'استعملكِ', 'segmented': '-استعمل--ك', 'tags': ':مضاف:'}, 
   {'vocolized': 'استعملكُمُ', 'semi-vocalized': 'استعملكُمُ', 'segmented': '-استعمل--كم', 'tags': ':مضاف:'}, 
   ....]
Kentoseth commented 3 years ago

wasalam,

This is a really great improvement.

Is there a way to slim down the results or is it meant to output so many different forms at once?

linuxscout commented 3 years ago

ٍSalam, Thank you. Do you mean reduce the number of generated word forms?

You can request to have a specific form according to given affixes like:

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"كِتِاب"
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"بال", u"", u"ين", u""])
['بِالْكِتَِابين']
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"وك", u"", u"ِ", u""])
['وَكَكِتَِابِ']
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"و", u"", u"", u""])
['وَكِتَِاب']

Or you can demand only reduced forms:

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"كِتِاب"
>>> noun_forms = generator.generate_forms( word, word_type="noun", indexed=True)
>>>noun_forms
{u'أككتابة': [u'أكَكِتَِابَةِ', u'أكَكِتَِابَةٍ'],
 u'أوككتابة': [u'أَوَكَكِتَِابَةِ', u'أَوَكَكِتَِابَةٍ'],
 u'وكتابياتهم': [u'وَكِتَِابياتهِمْ', u'وَكِتَِابِيَاتُهُمْ', u'وَكِتَِابِيَاتِهِمْ', u'وَكِتَِابِيَاتُهِمْ', u'وَكِتَِابياتهُمْ'],
 u'وكتابياتهن': [u'وَكِتَِابياتهِنَّ', u'وَكِتَِابياتهُنَّ', u'وَكِتَِابِيَاتِهِنَّ', u'وَكِتَِابِيَاتُهِنَّ', u'وَكِتَِابِيَاتُهُنَّ'],
 u'وللكتابات': [u'وَلِلْكِتَِابَاتِ', u'وَلِلْكِتَِابات'],
 u'أبكتابتكن': [u'أَبِكِتَِابَتِكُنَّ'],
 u'أبكتابتكم': [u'أَبِكِتَِابَتِكُمْ'],
 u'أكتابياتهن': [u'أَكِتَِابياتهِنَّ', u'أَكِتَِابِيَاتِهِنَّ', u'أَكِتَِابياتهُنَّ', u'أَكِتَِابِيَاتُهُنَّ', u'أَكِتَِابِيَاتُهِنَّ'],
 u'فكتاباتهم': [u'فَكِتَِاباتهِمْ', u'فَكِتَِابَاتُهُمْ', u'فَكِتَِابَاتُهِمْ', u'فَكِتَِاباتهُمْ', u'فَكِتَِابَاتِهِمْ'],
 u'بكتابياتكن': [u'بِكِتَِابِيَاتِكُنَّ', u'بِكِتَِابياتكُنَّ'],
....
}
Kentoseth commented 3 years ago

wasalam,

I will close this ticket now as the feature has been implemented.

جزاك الله خير

linuxscout commented 3 years ago

Salam, You give me an idea, I think I will implement some variant of generate forms function:

Kentoseth commented 3 years ago

wasalam,

If you are considering improving Alyahmor further, then my suggestions are:

  1. Nouns

    • Showing singular, dual and plural (with translations) - I think you referred to them as inflected forms above ^
    • Identifying the root
    • Identifying the Part of Speech

^ Something that confuses me in these inflected forms is the plural of كتاب is كُتُبٌ (book and books) but the female plural of كِتابات means "writings;essays". Is كِتابات the female plural equivalent of كُتُبٌ or is it the female plural of another word that isn't كتاب?

Source for noun lookups: http://www.aratools.com/

  1. Verbs

I think for verbs you already display everything (except translations), so verbs just need filters so that they can display results like this website(until participles):

https://cooljugator.com/ar/%D8%B9%D9%85%D9%84

So verbs would have the: tense/mood/participle, Arabic with harakat, English translation

This is a lot of work, so please only consider these improvements if you have the capacity. These ideas will be useful for Arabic learners as a dictionary reference, instead of using Hans Wehr.

linuxscout commented 3 years ago

wasalam, Salam,

If you are considering improving Alyahmor further, then my suggestions are:

1. Nouns

* Showing singular, dual and plural (with translations) - I think you referred to them as inflected forms above 

For translation, I think that will be another project,

* Identifying the root

Alyahmor uses Arramooz dictionary project, we can add roots.

* Identifying the Part of Speech

I propose the tags as attributes about the word form, I use Mysam project to generate the POS.

^ Something that confuses me in these inflected forms is the plural of كتاب is كُتُبٌ (book and books) but the female plural of كِتابات means "writings;essays". Is كِتابات the female plural equivalent of كُتُبٌ or is it the female plural of another word that isn't كتاب?

The word كتابات is just an example, it's not a plural form of كتاب.

Source for noun lookups: http://www.aratools.com/

Ok

  1. Verbs

I think for verbs you already display everything (except translations), so verbs just need filters so that they can display results like this website(until participles):

https://cooljugator.com/ar/%D8%B9%D9%85%D9%84

We have another project to handle verb conjugation: Qutrub project , on github repo

So verbs would have the: tense/mood/participle, Arabic with harakat, English translation

no translation

This is a lot of work, so please only consider these improvements if you have the capacity. These ideas will be useful for Arabic learners as a dictionary reference, instead of using Hans Wehr.

I hope to do this