ufal / udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
Mozilla Public License 2.0
359 stars 75 forks source link

joint_with_parsing setting requires a trained tagger #141

Closed jerrybonnell closed 3 years ago

jerrybonnell commented 3 years ago

Greetings, We are working on a Japanese UD project and are trying to tokenize some raw texts using the experimental sentence segmentation setting --tokenizer=joint_with_parsing. The model we have trained using UDPipe does not have a tagger, but according to the documentation, this setting seems to only rely on a model that contains a trained tokenizer and parser.

From the docs...

joint_with_parsing: an experimental mode performing sentence segmentation jointly using the tokenizer and the parser...

However, when trying this incantation on a model with a trained tokenizer and parser but no tagger, we are presented with the following error ...

$ udpipe --tokenizer=joint_with_parsing model raw_text_file 
Loading UDPipe model: done.
An error occurred during UDPipe execution: No tagger defined for the UDPipe model!

... are we doing something wrong in our incantation, or must we train a tagger in order to use the joint_with_parsing sentence segmentation setting? If the latter, perhaps this can be made clearer in the documentation.

Many thanks!

foxik commented 3 years ago

Hi,

thanks for the message -- that was just an oversight, the tokenizer called the tagger unconditionally. But the tagging was not even used (it was thrown away anyway), so I just removed the call.

I plan to eventually release UDPipe 1.3 in the 1 series, so there will be a release with this change eventually -- but given the current situation, it might take a while.

Cheers!