miotto / treetagger-python

A Python module for interfacing with the Treetagger by Helmut Schmid.
Other
77 stars 29 forks source link

a problem with tag() return type #23

Open ghost opened 5 years ago

ghost commented 5 years ago

Hi, you might recognize the code from your "README", I'm sorry to bother if my question is stupid and I thank you for the work you provided for us.

Here is the code I use ;

from treetagger import TreeTagger tt = TreeTagger(path_to_treetagger='C:\TreeTagger')

tmp = tt.tag('What is the airspeed of an unladen swallow?') print(tmp)

with or without the tmp and the print, I get the same return which is "[['Usage: tag-english file {file}']]".

I should get something different, in your example you get a proper answer. I tried some tests and added some prints in the tag function they all go right. I don't understand the problem and I would be grateful if you could help me.

The tt.get_installed_lang() method works just fine and so does the Tree Tagger on his own when I call it with a .txt file.

Many thanks again for your work.

PS: I'm sorry for the many mistakes you will find in this text, english is not my mother tongue.

ghost commented 5 years ago

And, if this helps even a bit, here is the execution from the cmd (I'm working on Windows) with TreeTagger and TEST.txt containing "What is the airspeed of an unladen swallow?".

C:\TreeTagger>tag-english TEST.txt reading parameters ... tagging ... What WP what is VBZ be the DT the airspeed NN airspeed of IN of an DT an unladen JJ unladen swallow NN swallow ? SENT ? finished.

miotto commented 5 years ago

Hi Izgeg,

please have a look at the following code fragment.

from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger='/path/to/your/TreeTagger/', language='german')
tt.tag('Das Haus hat einen großen hübschen Garten.')

You can specify a second parameter for the language when instantiating the TreeTagger class. There you can use a return value of the function get_installed_lang(). In the code fragment e.g. for a german sentence.

Does this answer the question?

Cheers

ghost commented 5 years ago

Hi again,

I'm sorry if I wasn't clear on what I meant.

The problem is pretty simple : I do not get the expected return.

from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger='C:\TreeTagger')
print(tt.tag("What is the airspeed of an unladen swallow?"))

For this code, that you gave in the README, you get the following return :

[['What', 'WP', 'what'],
['is', 'VBZ', 'be'],
['the', 'DT', 'the'],
['airspeed', 'NN', 'airspeed'],
['of', 'IN', 'of'],
['an', 'DT', 'an'],
['unladen', 'JJ', '<unknown>'],
['swallow', 'NN', 'swallow'],
['?', 'SENT', '?']]

For this same code that I tested on my computer, I get :

[['Usage: tag-english file {file}']]

But, the TreeTagger works properly on my computer from the cmd I get what I should get ie, the proper tokenization of the sentence.

C:\TreeTagger>tag-english TEST.txt
reading parameters ...
tagging ...
What WP what
is VBZ be
the DT the
airspeed NN airspeed
of IN of
an DT an
unladen JJ unladen
swallow NN swallow
? SENT ?
finished.

So, I believe there might be a problem in the treetagger.py file. If not, I would like to get some of your help for using properly your files.

Many thanks again and sorry for my bad english if it's not clear.

miotto commented 5 years ago

Please try the treetagger.py file from the branch windows_test 704f7e9 . I changed the call of the treetagger program. I don't have Windows, so I can't test it under Windows.

ghost commented 5 years ago
NLTK was unable to find the TreeTagger bin!
Traceback (most recent call last):
  File ".\test2.py", line 4, in <module>
    print(str(tt.tag('What is the airspeed of an unladen swallow?')))
  File "C:\Path\To\\treetagger.py", line 160, in tag
    p = Popen([self._treetagger_bin],
AttributeError: 'TreeTagger' object has no attribute '_treetagger_bin'

Hi, this is the error I get when I use the branch. I tried some modifications aiming to give you something working under Windows but couldn't make it work.

miotto commented 5 years ago

I can't help you because I don't have a Windows computer. You could try it under Linux. Installation instructions for Linux in e.g. VirtualBox under Windows can be found on the Internet.

simog-dev commented 3 years ago

Same problem for me, also Windows user. Any solution?

miotto commented 3 years ago

Have you been able to import the TreeTagger program into Python as follows? from treetagger import TreeTagger

Have you been able to create a new instance? tt = TreeTagger(path_to_treetagger='/path/to/treet-tagger')

If so, what is the output of the following command, does the path point to the TreeTagger executable? tt.get_treetagger_path()

simog-dev commented 3 years ago

Everything seems to work fine.

if i print the result of tt.get_treetagger_path() i get Environment variable 'TREETAGGER_HOME' is C:/TreeTagger/ Path to TreeTagger is C:/TreeTagger/ None

but when i print the result of print(tt.tag('What is the airspeed of an unladen swallow?')) i get [['Usage: tag-english file {file}']]

the full code is the following

from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger= 'C:/TreeTagger/',language='english')
#print(tt.get_installed_lang())
print(tt.get_treetagger_path())
print(tt.tag('What is the airspeed of an unladen swallow?'))

Using command line everything is working!

simog-dev commented 3 years ago

Watching at the "tag" function in treetagger.py, seems that the problem is raised by the line (stdout, stderr) = p.communicate(str(_input).encode('utf-8')) There "stdout" variable get the value [['Usage: tag-english file {file}']] as if the string passed is not a valid argument.

miotto commented 3 years ago

Apparently the TreeTagger programme must now be executed differently. The code is changed, please test it.

You can also run the Python doctest. To do this, set the environment variable in the Windows command line SET TREETAGGER_HOME=C:\TreeTagger and then execute the following python treetagger.py -v