Open ghost opened 5 years ago
And, if this helps even a bit, here is the execution from the cmd (I'm working on Windows) with TreeTagger and TEST.txt containing "What is the airspeed of an unladen swallow?".
C:\TreeTagger>tag-english TEST.txt reading parameters ... tagging ... What WP what is VBZ be the DT the airspeed NN airspeed of IN of an DT an unladen JJ unladen swallow NN swallow ? SENT ? finished.
Hi Izgeg,
please have a look at the following code fragment.
from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger='/path/to/your/TreeTagger/', language='german')
tt.tag('Das Haus hat einen großen hübschen Garten.')
You can specify a second parameter for the language when instantiating the TreeTagger
class. There you can use a return value of the function get_installed_lang()
. In the code fragment e.g. for a german
sentence.
Does this answer the question?
Cheers
Hi again,
I'm sorry if I wasn't clear on what I meant.
The problem is pretty simple : I do not get the expected return.
from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger='C:\TreeTagger')
print(tt.tag("What is the airspeed of an unladen swallow?"))
For this code, that you gave in the README, you get the following return :
[['What', 'WP', 'what'],
['is', 'VBZ', 'be'],
['the', 'DT', 'the'],
['airspeed', 'NN', 'airspeed'],
['of', 'IN', 'of'],
['an', 'DT', 'an'],
['unladen', 'JJ', '<unknown>'],
['swallow', 'NN', 'swallow'],
['?', 'SENT', '?']]
For this same code that I tested on my computer, I get :
[['Usage: tag-english file {file}']]
But, the TreeTagger works properly on my computer from the cmd I get what I should get ie, the proper tokenization of the sentence.
C:\TreeTagger>tag-english TEST.txt
reading parameters ...
tagging ...
What WP what
is VBZ be
the DT the
airspeed NN airspeed
of IN of
an DT an
unladen JJ unladen
swallow NN swallow
? SENT ?
finished.
So, I believe there might be a problem in the treetagger.py file. If not, I would like to get some of your help for using properly your files.
Many thanks again and sorry for my bad english if it's not clear.
Please try the treetagger.py
file from the branch windows_test
704f7e9 . I changed the call of the treetagger program.
I don't have Windows, so I can't test it under Windows.
NLTK was unable to find the TreeTagger bin!
Traceback (most recent call last):
File ".\test2.py", line 4, in <module>
print(str(tt.tag('What is the airspeed of an unladen swallow?')))
File "C:\Path\To\\treetagger.py", line 160, in tag
p = Popen([self._treetagger_bin],
AttributeError: 'TreeTagger' object has no attribute '_treetagger_bin'
Hi, this is the error I get when I use the branch. I tried some modifications aiming to give you something working under Windows but couldn't make it work.
I can't help you because I don't have a Windows computer. You could try it under Linux. Installation instructions for Linux in e.g. VirtualBox under Windows can be found on the Internet.
Same problem for me, also Windows user. Any solution?
Have you been able to import the TreeTagger program into Python as follows?
from treetagger import TreeTagger
Have you been able to create a new instance?
tt = TreeTagger(path_to_treetagger='/path/to/treet-tagger')
If so, what is the output of the following command, does the path point to the TreeTagger executable?
tt.get_treetagger_path()
Everything seems to work fine.
if i print the result of
tt.get_treetagger_path()
i get
Environment variable 'TREETAGGER_HOME' is C:/TreeTagger/ Path to TreeTagger is C:/TreeTagger/ None
but when i print the result of
print(tt.tag('What is the airspeed of an unladen swallow?'))
i get
[['Usage: tag-english file {file}']]
the full code is the following
from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger= 'C:/TreeTagger/',language='english')
#print(tt.get_installed_lang())
print(tt.get_treetagger_path())
print(tt.tag('What is the airspeed of an unladen swallow?'))
Using command line everything is working!
Watching at the "tag" function in treetagger.py, seems that the problem is raised by the line
(stdout, stderr) = p.communicate(str(_input).encode('utf-8'))
There "stdout" variable get the value [['Usage: tag-english file {file}']]
as if the string passed is not a valid argument.
Apparently the TreeTagger programme must now be executed differently. The code is changed, please test it.
You can also run the Python doctest. To do this, set the environment variable in the Windows command line
SET TREETAGGER_HOME=C:\TreeTagger
and then execute the following
python treetagger.py -v
Hi, you might recognize the code from your "README", I'm sorry to bother if my question is stupid and I thank you for the work you provided for us.
Here is the code I use ;
from treetagger import TreeTagger tt = TreeTagger(path_to_treetagger='C:\TreeTagger')
tmp = tt.tag('What is the airspeed of an unladen swallow?') print(tmp)
with or without the tmp and the print, I get the same return which is "[['Usage: tag-english file {file}']]".
I should get something different, in your example you get a proper answer. I tried some tests and added some prints in the tag function they all go right. I don't understand the problem and I would be grateful if you could help me.
The tt.get_installed_lang() method works just fine and so does the Tree Tagger on his own when I call it with a .txt file.
Many thanks again for your work.
PS: I'm sorry for the many mistakes you will find in this text, english is not my mother tongue.