ufal / udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
Mozilla Public License 2.0
359 stars 75 forks source link

issues in loading the server and applying the udpip2_client.py #181

Closed hezaoke closed 10 months ago

hezaoke commented 10 months ago

Hi! I would like to load the server and then run the udpipe2_client.py to parse my input file with the English gum model. But when I tried to load the server, I could not make the following work in the command line (I cloned udpipe from github today):

python3 udpipe2_server.py \ --logfile udpipe_server.log \ --threads 4 \ --batch_size 32 \ --preload_models all \ 8001 \ en_gum-ud-2.12-230717 \ "en_gum-ud-2.12-230717:/path/to/en_gum-ud-2.12-230717.model:https://example.com"

error message: udpipe2_server.py: error: the following arguments are required: port, default_model, models

Any advice?

Thank you in advance for your help!

Alan

hezaoke commented 10 months ago

Never mind, I found the solution by simplifying the arguments. And I've also applied the udpipe2_client.py.

However, I found that udpipe2_client.py takes only Conllu file as input and cannot process txt file. But I thought it could process a raw txt file to generate a conllu file. Should I look for something else? Or this might be because of some settings I did not realize? Is this because udpipe2 does not process the raw data from scratch, but uses the output from udpipe1 and revise the tagging and the parsing results?

If you would need to know my command line code to run the udpipe2_client.py file, please let me know.

I've changed the title in the hope that I can include this follow up question here.

foxik commented 10 months ago

Dear @hezaoke,

I am glad you found out the correct command line :+1:

To process plain text files with udpipe2_client.py, the key is to include the --tokenizer= option, which changes input processing by assuming the input is a plain text file and tokenizing it (exactly as described in the REST API documentation https://lindat.mff.cuni.cz/services/udpipe/api-reference.php#process, because the udpipe2_client.py just forwards the specified request to the service).

Cheers!

hezaoke commented 10 months ago

Thank you very much for the clue. It now works.

On Sat, Nov 18, 2023 at 1:56 PM Milan Straka @.***> wrote:

Dear @hezaoke https://github.com/hezaoke,

I am glad you found out the correct command line 👍

To process plain text files with udpipe2_client.py, the key is to include the --tokenizer= option, which changes input processing by assuming the input is a plain text file and tokenizing it (exactly as described in the REST API documentation https://lindat.mff.cuni.cz/services/udpipe/api-reference.php#process, because the udpipe2_client.py just forwards the specified request to the service).

Cheers!

— Reply to this email directly, view it on GitHub https://github.com/ufal/udpipe/issues/181#issuecomment-1817596347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWVPZJOLDJJCKJ27FMYRGLYFEAF3AVCNFSM6AAAAAA7PGQFV6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXGU4TMMZUG4 . You are receiving this because you were mentioned.Message ID: @.***>