German language support for TextBlob by Steven Loria.
This python package is being developed as a TextBlob
Language
Extension. See Extension
Guidelines
for details.
textblob_de
classes (e.g. Sentence()
or
Word()
) are initialized with default models for GermanNotImplementedError
NLTKPunktTokenizer
)NLTKPunktTokenizer
or PatternTokenizer
)PatternTagger
) with keyword
include_punc=True
(defaults to False
)PatternTagger
with keyword
tagset='penn'|'universal'|'stts'
(defaults to penn
)PatternParser
) with all pattern
keywords, plus
pprint=True
(defaults to False
)PatternParserNPExtractor
)PatternParserLemmatizer
)PatternAnalyzer
) - Still EXPERIMENTAL,
does not yet have information on subjectivitypattern.text.de
API support on Python3$ pip install -U textblob-de
$ python -m textblob.download_corpora
Or the latest development release (apparently this does not always work on Windows see issues #1744/5 for details):
$ pip install -U git+https://github.com/markuskiller/textblob-de.git@dev
$ python -m textblob.download_corpora
Note
TextBlob
will be installed/upgraded automatically when running
pip install
. The second line (python -m textblob.download_corpora
)
downloads/updates nltk corpora and language models used in TextBlob
.
>>> from textblob_de import TextBlobDE as TextBlob
>>> text = '''Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag.
Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider
habe ich nur noch EUR 3.50 in meiner Brieftasche.'''
>>> blob = TextBlob(text)
>>> blob.sentences
[Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."),
Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."),
Sentence("Aber leider habe ich nur noch EUR 3.50 in meiner Brieftasche.")]
>>> blob.tokens
WordList(['Heute', 'ist', 'der', '3.', 'Mai', ...]
>>> blob.tags
[('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'),
('2014', 'CD'), ...]
# Default: Only noun_phrases that consist of two or more meaningful parts are displayed.
# Not perfect, but a start (relies heavily on parser accuracy)
>>> blob.noun_phrases
WordList(['Mai 2014', 'Dr. Meier', 'seinen 43. Geburtstag', 'Kuchen einzukaufen',
'meiner Brieftasche'])
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.parse()
'Das/DT/B-NP/O Auto/NN/I-NP/O ist/VB/B-VP/O sehr/RB/B-ADJP/O schön/JJ/I-ADJP/O'
>>> from textblob_de import PatternParser
>>> blob = TextBlobDE("Das ist ein schönes Auto.", parser=PatternParser(pprint=True, lemmata=True))
>>> blob.parse()
WORD TAG CHUNK ROLE ID PNP LEMMA
Das DT - - - - das
ist VB VP - - - sein
ein DT NP - - - ein
schönes JJ NP ^ - - - schön
Auto NN NP ^ - - - auto
. . - - - - .
>>> from textblob_de import PatternTagger
>>> blob = TextBlob(text, pos_tagger=PatternTagger(include_punc=True))
[('Das', 'DT'), ('Auto', 'NN'), ('ist', 'VB'), ('sehr', 'RB'), ('schön', 'JJ'), ('.', '.')]
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.sentiment
Sentiment(polarity=1.0, subjectivity=0.0)
>>> blob = TextBlob("Das ist ein hässliches Auto.")
>>> blob.sentiment
Sentiment(polarity=-1.0, subjectivity=0.0)
Warning
WORK IN PROGRESS: The German polarity lexicon contains only
uninflected forms and there are no subjectivity scores yet. As of
version 0.2.3, lemmatized word forms are submitted to the
PatternAnalyzer
, increasing the accuracy of polarity values. New in
version 0.2.7: return type of .sentiment
is now adapted to the main
TextBlob library
(:rtype: namedtuple
).
>>> blob.words.lemmatize()
WordList(['das', 'sein', 'ein', 'hässlich', 'Auto'])
>>> from textblob_de.lemmatizers import PatternParserLemmatizer
>>> _lemmatizer = PatternParserLemmatizer()
>>> _lemmatizer.lemmatize("Das ist ein hässliches Auto.")
[('das', 'DT'), ('sein', 'VB'), ('ein', 'DT'), ('hässlich', 'JJ'), ('Auto', 'NN')]
pattern
API in Python3>>> from textblob_de.packages import pattern_de as pd
>>> print(pd.attributive("neugierig", gender=pd.FEMALE, role=pd.INDIRECT, article="die"))
neugierigen
Note
Alternatively, the path to textblob_de/ext
can be added to the
PYTHONPATH
, which allows the use of pattern.de
in almost the same
way as described in its
Documentation. The only
difference is that you will have to prepend an underscore:
from _pattern.de import ...
. This is a precautionary measure in case
the pattern
library gets native Python3 support in the future.
NLTKTagger
)RFTagger
output)Sentence()
and Word()
objectsTextBlobDE()
in test_blob.py
)MIT licensed. See the bundled LICENSE file for more details.
Coded with Wing IDE (free open source developer license)