markuskiller / textblob-de

German language support for TextBlob.
https://textblob-de.readthedocs.org
MIT License
104 stars 12 forks source link
natural-language-processing nlp python

textblob-de README

textblob\_de - latest PyPI version Build Status Documentation Status Number of PyPI downloads LICENSE info

German language support for TextBlob by Steven Loria.

This python package is being developed as a TextBlob Language Extension. See Extension Guidelines for details.

Features

Installing/Upgrading

$ pip install -U textblob-de
$ python -m textblob.download_corpora

Or the latest development release (apparently this does not always work on Windows see issues #1744/5 for details):

$ pip install -U git+https://github.com/markuskiller/textblob-de.git@dev
$ python -m textblob.download_corpora

Note

TextBlob will be installed/upgraded automatically when running pip install. The second line (python -m textblob.download_corpora) downloads/updates nltk corpora and language models used in TextBlob.

Usage

>>> from textblob_de import TextBlobDE as TextBlob
>>> text = '''Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag. 
Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider 
habe ich nur noch EUR 3.50 in meiner Brieftasche.'''
>>> blob = TextBlob(text)
>>> blob.sentences
[Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."),
 Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."),
 Sentence("Aber leider habe ich nur noch EUR 3.50 in meiner Brieftasche.")]
>>> blob.tokens
WordList(['Heute', 'ist', 'der', '3.', 'Mai', ...]
>>> blob.tags
[('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'), 
('2014', 'CD'), ...]
# Default: Only noun_phrases that consist of two or more meaningful parts are displayed.
# Not perfect, but a start (relies heavily on parser accuracy)
>>> blob.noun_phrases
WordList(['Mai 2014', 'Dr. Meier', 'seinen 43. Geburtstag', 'Kuchen einzukaufen', 
'meiner Brieftasche'])
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.parse()
'Das/DT/B-NP/O Auto/NN/I-NP/O ist/VB/B-VP/O sehr/RB/B-ADJP/O schön/JJ/I-ADJP/O'
>>> from textblob_de import PatternParser
>>> blob = TextBlobDE("Das ist ein schönes Auto.", parser=PatternParser(pprint=True, lemmata=True))
>>> blob.parse()
      WORD   TAG    CHUNK   ROLE   ID     PNP    LEMMA   

       Das   DT     -       -      -      -      das     
       ist   VB     VP      -      -      -      sein    
       ein   DT     NP      -      -      -      ein     
   schönes   JJ     NP ^    -      -      -      schön   
      Auto   NN     NP ^    -      -      -      auto    
         .   .      -       -      -      -      .       
>>> from textblob_de import PatternTagger
>>> blob = TextBlob(text, pos_tagger=PatternTagger(include_punc=True))
[('Das', 'DT'), ('Auto', 'NN'), ('ist', 'VB'), ('sehr', 'RB'), ('schön', 'JJ'), ('.', '.')]
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.sentiment
Sentiment(polarity=1.0, subjectivity=0.0)
>>> blob = TextBlob("Das ist ein hässliches Auto.")     
>>> blob.sentiment
Sentiment(polarity=-1.0, subjectivity=0.0)

Warning

WORK IN PROGRESS: The German polarity lexicon contains only uninflected forms and there are no subjectivity scores yet. As of version 0.2.3, lemmatized word forms are submitted to the PatternAnalyzer, increasing the accuracy of polarity values. New in version 0.2.7: return type of .sentiment is now adapted to the main TextBlob library (:rtype: namedtuple).

>>> blob.words.lemmatize()
WordList(['das', 'sein', 'ein', 'hässlich', 'Auto'])
>>> from textblob_de.lemmatizers import PatternParserLemmatizer
>>> _lemmatizer = PatternParserLemmatizer()
>>> _lemmatizer.lemmatize("Das ist ein hässliches Auto.")
[('das', 'DT'), ('sein', 'VB'), ('ein', 'DT'), ('hässlich', 'JJ'), ('Auto', 'NN')]

Access to pattern API in Python3

>>> from textblob_de.packages import pattern_de as pd
>>> print(pd.attributive("neugierig", gender=pd.FEMALE, role=pd.INDIRECT, article="die"))
neugierigen

Note

Alternatively, the path to textblob_de/ext can be added to the PYTHONPATH, which allows the use of pattern.de in almost the same way as described in its Documentation. The only difference is that you will have to prepend an underscore: from _pattern.de import .... This is a precautionary measure in case the pattern library gets native Python3 support in the future.

Documentation and API Reference

Requirements

TODO

License

MIT licensed. See the bundled LICENSE file for more details.

Thanks

Coded with Wing IDE (free open source developer license)

Python IDE for Python - wingware.com