Change external lexicon after training using Python

ufal / udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files

Mozilla Public License 2.0

358 stars 75 forks source link

Change external lexicon after training using Python #78

Closed GMarzinotto closed 6 years ago

GMarzinotto commented 6 years ago

I have been using UDPipe and so far it works like a charm

But when I want to use the CLI to parse several sentences in different files UDpipe reloads the model for each file. Since I have many small files, this lags the process a lot.

To prevent this from happening I created a python script using the wrapper provided here. So I load the model once and then I use it to parse all the files that I want.

The problem is that the python wrapper is not fully documented I would like to know how to use all the parameters that are available in the CLI.

For instance, how to provide an external lexicon file for inference using the python wrapper.

Thank you very much for your help, Gabriel M

foxik commented 6 years ago

If I recall you cannot really pass an external lexicon file for inference, even to the binary (of if you somehow can, send me how :-)

The UDPipe API documentation is available at http://ufal.mff.cuni.cz/udpipe/api-reference. All bindings have exactly these methods, but renamed to use camelcase and more trivial types -- this simplified API is described at http://ufal.mff.cuni.cz/udpipe/api-reference#cpp_bindings_api.

Also, UDPipe comes with a REST server (udpipe_server), which can be used to load the model once and then process any number of inputs without a delay.

ioan2 commented 6 years ago

Hi all, I forked udpipe some months ago to add exactly this functionality. You find it here https://github.com/ioan2/udpipe

Johannes

foxik commented 6 years ago

@ioan2 :-) New version of UDPipe will be released in a couple of months (with much more powerful models); external dictionary for inference will also be supported [optionally specifying only some columns] (and external word embeddings will also be supported).

GMarzinotto commented 6 years ago

I apologize, I got confused as I was working with the forked version!

However, thank you for the documentation and the information about the REST server ! I think the REST server is just what I need!

I'll be closing the issue