sentences fails on certain enumerations

Thanks for your report. This is an nltk issue. textblob-de uses nltk 's PunktSentenceTokenizer as the default sentence boundary detector:

In [1]: from nltk import tokenize
In [2]: snt_tokenizer = tokenize.load('tokenizers/punkt/german.pickle')
In [3]: snt_tokenizer.tokenize("Heute ist der 1. des Monats und mein Geld ist noch nicht \
auf dem Konto. Mein 1. Hauptfach war doof mein 2. Versuch war etwas besser.")
Out[3]:
['Heute ist der 1. des Monats und mein Geld ist noch nicht auf dem Konto.',
 'Mein 1.',
 'Hauptfach war doof mein 2.',
 'Versuch war etwas besser.']

Even though it is not perfect, it yielded better results in my tests than other tokenizers implemented purely in python (e.g. PatternTokenizer). But textblob enables you to experiment with different tokenizers. The two implementations included in textblob-de have different pros and cons. The PatternTokenizer, for example, is able to handle emoticons (;-) :() quite well. The NLTKPunktTokenizer, on the other hand, usually performs better on ordinal numbers and abbreviations:

In [1]: from textblob_de import TextBlobDE as tb
In [2]: from textblob_de import PatternTokenizer
In [3]: blob = tb("Heute ist der 1. des Monats und mein Geld ist noch nicht auf dem Konto. \
Mein 1. Hauptfach war doof mein 2. Versuch war etwas besser.", tokenizer=PatternTokenizer())
In [4]: blob.sentences
Out[4]:
[Sentence("Heute ist der 1 ."),
 Sentence("des Monats und mein Geld ist noch nicht auf dem Konto ."),
 Sentence("Mein 1 ."),
 Sentence("Hauptfach war doof mein 2 ."),
 Sentence("Versuch war etwas besser .")]

In this case, all ordinal numbers break up the sentence. If you know of a better German sentence splitter with a publicly available Python implementation, I am more than happy to include it in textblob-de and I could even use it as the default method. However, I don't think it would be a good idea to start tweaking the results of the PunktSentenceTokenizer. This has to be solved within the nltk project (i.e. the training process for the german.pickle and the lists passed to the class parameters would have to be refined).

markuskiller / textblob-de

sentences fails on certain enumerations #8