jwijffels / udpipe.models.ud.2.0

udpipe-ud-2.0-170801- models made available at http://hdl.handle.net/11234/1-2364
Other
4 stars 3 forks source link

Running train_all.sh outputs 'Missing udpipe' despite successful UDPipe installation #1

Closed jbrry closed 6 years ago

jbrry commented 6 years ago

Hi there,

I am trying to run the reproducible models 2.0. I have downloaded the udpipe-ud-2.0-170801-reproducible_training.zip file and in the README.TXT it states:

To train the udpipe-ud-2.0-170801 models: 1) run get.sh in ud-2.0 directory. The script downloads the UD 2.0 data, renames directories, put testing data to corret places, and resplits train + dev (such that train data is at least 9 times dev data).

2) run train_all.sh in models-ud-2.0 directory. (Note the training script sequentially trains individual treebanks -- if you can, run the training in a distributed manner.)

I have no problem running the get.sh script in step 1). However, when I try to run train_all.sh it outputs "Missing udpipe". If I view the train_all.sh script, the code is as follows:

#!/bin/bash

 [ -x udpipe ] || { echo Missing udpipe >&2; exit 1; }

ls="$@"
[ -z "$ls" ] && ls=`awk '{print $1}' params_parser`
for l in $ls; do
  mkdir -p $l
  ./train.sh $l >& $l/$l.log
done

I assume the problem is that the UDPipe executable is not being picked up by my $PATH? I have used UDPipe before and copied the executable to /usr/local/bin and have no problems running UDPipe models from anywhere in my filesystem so it seems unusual that it would not be able to pick up UDPipe. I have also since added the executable to my ~/.bashrc file and I still am not able to run this script.

Apologies if I should be directing this question on the UDPipe official GitHub repo, I just thought that maybe you would be able to point out where my problem could be coming from, as it seems like a small issue.

Many thanks, James

jwijffels commented 6 years ago

I think you should point this to the UDPipe repo. This repo is just a copy to allow to easily use the R package with these models. But your problem looks like the udpipe executable is not in your directory where you call train_all.sh Are you planning to build new UDPipe models? If you do and you plan to use R, you can also use the udpipe R package. Example training scripts for R are put at https://github.com/bnosac/udpipe.models.ud. If you plan to use the executable from UDPipe, just proceed as you are doing and try to fix your path problem.

jbrry commented 6 years ago

Thank you very much for your response and please excuse me for opening an issue on this repo as it's not related to the R package!

I also think the same and I will try and fix this path problem. At the moment, I'm trying to run the UDPipe models as a baseline to compare parsing accuracy using POS tags generated by UDPipe vs POS tags using a different POS tagger. I was using the models in the reproducible training file as practice to see how the whole pipeline works. Thank you for the suggestion for using the R package! Right now, I have no problems running UDPipe models on a single language but I am trying to find a solution for running the UDPipe models on all languages and on the train/dev sets so the reproducible training file seemed like a good place to start because it includes the bash scripts! I should be able to see how the whole system works once I get the bash scripting/path issue figured out! Thanks

jwijffels commented 6 years ago

Which POS tagger are you comparing against? FYI. I recently compared UDPipe & spaCy here: https://github.com/jwijffels/udpipe-spacy-comparison

jbrry commented 6 years ago

I was going to try the BiLSTM of Plank et al. (2016): https://github.com/bplank/bilstm-aux. I would like to try a system which uses token and character level representations, which might help improve accuracy for morphologically rich languages.

I noticed in the last CoNLL shared task on multilingual parsing, the Stanford team used their own POS tagger which had some improvements over the baseline system so I was looking at exploring other methods to increase parsing accuracy. Thanks very much for the comparison - it seems that the UDPipe system is more powerful than the spaCy tagger on pretty much all tests. Hopefully, I will have both the Plank and UDPipe models ran soon so I can get a clearer idea of the relative performance between the two but from what I gather both seem to perform very well.

jwijffels commented 6 years ago

Thank you for the information; Feel free to let me know if you have something to share on the comparison with BiLSTM.

jbrry commented 6 years ago

Thanks very much for your help and certainly, will do!