miotto / treetagger-python

A Python module for interfacing with the Treetagger by Helmut Schmid.
Other
77 stars 29 forks source link

TypeError: str() takes at most 1 argument (2 given) #11

Closed niklasben closed 7 years ago

niklasben commented 8 years ago

When executing I am getting the following Error Message.

Traceback (most recent call last):
  File "tt_testfile.py", line 7, in <module>
    pprint(tt.tag('What is the airspeed of an unladen swallow?'))
  File "build/bdist.linux-i686/egg/treetagger.py", line 123, in tag
TypeError: str() takes at most 1 argument (2 given)

The Error points to (stdout, stderr) = p.communicate(bytes(_input, 'UTF-8')).

miotto commented 8 years ago
TypeError: str() takes at most 1 argument (2 given)

After a Google search, this error is more likely to point to an encoding problem.

Can you give me the relevant code lines with the execution of treetagger and pprint or your whole file tt_testfile.py? Then I can check if it works for me. Which Python version do you use?

My test file looks like this

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pprint
from treetagger import TreeTagger

tt = TreeTagger(language='english')
print(tt.tag('What is the airspeed of an unladen swallow?'))

pprint.pprint(tt.tag('What is the airspeed of an unladen swallow?'))
niklasben commented 8 years ago

Basically the same as yours

# -*- coding: utf-8 -*-

from treetagger import TreeTagger
from pprint import pprint

tt = TreeTagger(language='english')
pprint(tt.tag('What is the airspeed of an unladen swallow?'))

I am using Python 2.7.6.

miotto commented 8 years ago

Ok I could understand this error. In Python 2, there were major problems with the conversion between Unicode, ASCII, and UTF-8.

I made a small change to the current code. If you do not use umlauts as in German, then it could also work with Python 2. Otherwise, there is a UnicodeDecodeError under Python 2.

 #(stdout, stderr) = p.communicate(bytes(_input, 'UTF-8'))
(stdout, stderr) = p.communicate(str(_input).encode('utf-8'))

You can also view the old treetagger version for Python 2 treetagger_python2.py

I would say you should try it with Python 3.

niklasben commented 8 years ago

The changed line of code doesn't work for me, I am still getting the same Error Message. With Python3 I am getting TreeTagger parameter file invalid: german-utf8.par same with the English Version. Anyway, I will try to get it to work.

niklasben commented 7 years ago

Sorry it took so long, I tried to reproduce this on another client today.

System

Long story short: I wasn't able to reproduce the error on this machine. So I'd say it has to do with some configuration problems on the other client, therefore I will close the issue.

I am going to write a bigger Testscript if I have the time. Until then this is the short script and output from the terminal:

>>> import pprint
>>> import treetaggerwrapper
>>> tagger = treetaggerwrapper.TreeTagger(TAGLANG='de')
>>> tags = tagger.tag_text(u"Dies ist ein kurzer Satz zum Testen.")
>>> pprint.pprint(tags)
[u'Dies\tPDS\tdies',
 u'ist\tVAFIN\tsein',
 u'ein\tART\teine',
 u'kurzer\tADJA\tkurz',
 u'Satz\tNN\tSatz',
 u'zum\tAPPRART\tzu',
 u'Testen\tNN\tTesten',
 u'.\t$.\t.']