bleu (Python Package)

A Python Wrapper for the standard BLEU evaluation for Natural Language Generation (NLG).

GitHub project: https://github.com/zhijing-jin/bleu.
PyPI package: pip installbleu

Installation

Requirement: Python 3

Option 1: Install pip package

pip install --upgrade bleu

Option 2: Build from source

pip install --upgrade git+git://github.com/zhijing-jin/bleu.git

How to Run

The most standard way to calculate BLEU is by Moses' script for detokenized BLEU. This package provides easy calls to it.

Function 1: Calculate the BLEU for lists

If you want to check only one hypothesis (a list of sentences):

>>> from bleu import list_bleu
>>> ref = ['it is a white cat .',
             'wow , this dog is huge .']
>>> ref1 = ['This cat is white .',
             'wow , this is a huge dog .']
>>> hyp = ['it is a white kitten .',
            'wowww , the dog is huge !']
>>> hyp1 = ["it 's a white kitten .",
             'wow , this dog is huge !']
>>> list_bleu([ref], hyp)
34.99
>>> list_bleu([ref, ref1], hyp1)
57.91

If you want to check multiple hypothesis (several lists of sentences):

>>> from bleu import multi_list_bleu
>>> multi_list_bleu([ref, ref1], [hyp, hyp1])
[34.99, 57.91]

detok=False: It is not advisable to use tokenized bleu (by multi-bleu.perl), but if you want to call it, just use detok=False:

>>> list_bleu([ref], hyp, detok=False)
39.76
# or if you want to test multiple hypotheses
>>> multi_list_bleu([ref, ref1], [hyp, hyp1], detok=False)
[39.76, 47.47]

verbose=True: If there are unexpected errors, you might want to check the intermediate steps by verbose=True.

Function 2: Calculate the BLEU for files

If you want to check only one hypothesis file:

# if you already have the following files
>>> from bleu import file_bleu
>>> hyp_file = 'data/hyp0.txt'
>>> ref_files = ['data/ref0.txt', 'data/ref1.txt']
>>> file_bleu(ref_files, hyp_file)
34.99

If you want to check multiple hypothesis files:

>>> from bleu import multi_file_bleu
>>> hyp_file1 = 'data/hyp1.txt'
>>> bleus = multi_file_bleu(ref_files, [hyp_file, hyp_file1])
[34.99, 57.91]

detok=True: Set it if you want to calculate the (not recommended) tokenized bleu.

verbose=True: Set it if you want to inspect how the bleu calculations are made:

>>> bleu = file_bleu(ref_files, hyp_file, verbose=True)
[Info] Valid Reference Files: ['data/ref0.txt', 'data/ref1.txt']
[Info] Valid Hypothesis Files: ['data/hyp0.txt']
[Info] #lines in each file: 2
[cmd] perl detokenizer.perl -l en < data/ref0.txt > data/ref0.detok.txt 2>/dev/null
[cmd] perl detokenizer.perl -l en < data/ref1.txt > data/ref1.detok.txt 2>/dev/null
[cmd] perl detokenizer.perl -l en < data/hyp0.txt > data/hyp0.detok.txt 2>/dev/null
[cmd] perl multi-bleu-detok.perl data/ref0.detok.txt data/ref1.detok.txt < data/hyp0.detok.txt
2-ref bleu for data/hyp0.detok.txt: 34.99
>>> bleu
34.99

Option 3: Detokenize files

>>> from bleu import detok_files
>>> detok_ref_files = detok_files(ref_files, tmp_dir='./data', file_prefix='ref_dtk', verbose=True)
[cmd] perl ./TMP_DIR/detokenizer.perl -l en < data/ref0.txt > data/ref_dtk0.txt 2>/dev/null
[cmd] perl ./TMP_DIR/detokenizer.perl -l en < data/ref1.txt > data/ref_dtk1.txt 2>/dev/null
>>> detok_ref_files
['data/ref_dtk0.txt', 'data/ref_dtk1.txt']

In Case of Unexpected Outputs

Check the python file bleu.py and adapt it.

Contact

If you have more questions, feel free to check out the common Q&A, or raise a new GitHub issue.

In case of really urgent needs, contact the author Zhijing Jin (Miss).

zhijing-jin / bleu

readme