ArrayIndexOutOfBoundsException with parse_fn.sh

I got the error above in two cases: when there are empty lines in the input file (so I got rid of them), and again immediately after getting ERROR: sentence length mismatches token number in Stanford annotation, maybe it has something to do with one of the words in that sentence being "voila" with an accented letter "a".

I have not been able to replicate the latter problem. Can you give the full sentence?

Is there a flag I can pass so that the pipeline will silently ignore such errors?

There is no flag, but all errors are written to STDERR, so you can ignore them by redirecting to /dev/null, like this: "sh ... 2> /dev/null".

On the same note, I've 23M sentences to label - do you think it's better to split them to N files and run N processes for parse_fn.sh, or I should stick to my current 1 file with 23M sentences?

Personally, I would split the file into 23 (or more) smaller files. Otherwise, the process might be running for a month(?).

Cheers, Michael

microth / PathLSTM

ArrayIndexOutOfBoundsException with parse_fn.sh #22