Closed genbei closed 3 years ago
Hi, --ref is human reference translation, --baseref are base forms (lemmas) of the human reference translation. You need a morphological analyser for the target language to get those. If you don't have it, the tool simulates the base forms by taking first 4 letters of each word. Machine translation output is given by -H (directly) and -b (base forms of MT output, or 4 letters) In your case, you can do hjerson.py -R dev.pe -H dev.mt and the tool will truncate the words to 4 letters to provide the missing "base" files. It's a good approximation if you do not have a morphological analyser for your language, but if you do, I advise to generate the base forms and use them because it's better.
This script can't be run python hjerson.py -R dev.pe -H dev.mt
,as shown in the figure
Only the following prompts will appear ` hjerson.py -R, --ref reference -H, --hyp hypothesis -B, --baseref reference.base -b, --basehyp hypothesis.base
optional inputs: -A, --addref reference.additional -a, --addhyp hypothesis.additional
optional outputs: -s, --sent file.sent write sentence error rates
-m, --html file.html write error categories in a html file
-c, --cats file.cats write error categories in a text file
`
Hi, --ref is human reference translation, --baseref are base forms (lemmas) of the human reference translation. You need a morphological analyser for the target language to get those. If you don't have it, the tool simulates the base forms by taking first 4 letters of each word. Machine translation output is given by -H (directly) and -b (base forms of MT output, or 4 letters) In your case, you can do hjerson.py -R dev.pe -H dev.mt and the tool will truncate the words to 4 letters to provide the missing "base" files. It's a good approximation if you do not have a morphological analyser for your language, but if you do, I advise to generate the base forms and use them because it's better.
This script can't be run python hjerson.py -R dev.pe -H dev.mt
,as shown in the figure
Only the following prompts will appear ` hjerson.py -R, --ref reference -H, --hyp hypothesis -B, --baseref reference.base -b, --basehyp hypothesis.base
optional inputs: -A, --addref reference.additional -a, --addhyp hypothesis.additional
optional outputs: -s, --sent file.sent write sentence error rates
-m, --html file.html write error categories in a html file
-c, --cats file.cats write error categories in a text file
`
that's strange, I've just checked it on my computer and it works, it backs off to four letters
the following function enables this back off: def take_four_letters(line): bline="" words = line.strip().split() for w in words: bline+=w[:4]+" "
and this check whether there are base forms or not: if not(args.reference_base or args.hypothesis_base): baserline = take_four_letters(rline) basehline = take_four_letters(hline) else: baserline = args.reference_base.readline() basehline = args.hypothesis_base.readline()
Maybe to run it with python3 will help?
that's strange, I've just checked it on my computer and it works, it backs off to four letters
the following function enables this back off: def take_four_letters(line): bline="" words = line.strip().split() for w in words: bline+=w[:4]+" "
and this check whether there are base forms or not: if not(args.reference_base or args.hypothesis_base): baserline = take_four_letters(rline) basehline = take_four_letters(hline) else: baserline = args.reference_base.readline() basehline = args.hypothesis_base.readline()
Maybe to run it with python3 will help?
Ok, I can run normally now. Maybe there is something wrong with the code I downloaded before. Thank you very much for your answer
You're welcome :)
Hello, I have a question. What's the difference?--ref reference ,--baseref reference.base I thought that after giving a machine translation and a manually modified translation, I could calculate the error classification. In addition, the specific use of scripts, such as python hjerson.py -R data/dev.pe -H data/dev.mt -B TEXT -b TEXT -s -m Are these four documents necessary? -R -H -B -b,Because I only have two documents on hand:dev.mt dev.pe Looking forward to your reply