scramblingbalam / F2016_EECS595_NLP

Programing assignments for Natural Language Processing
0 stars 0 forks source link

log probability of Ngrams #8

Open scramblingbalam opened 7 years ago

scramblingbalam commented 7 years ago

Calculate the uni-, bi-, and trigram log-probabilities of the data in “Brown_train.txt”. This corresponds to implementing the calc_probabilities() function. In this assignment we will always use log base 2. Don’t forget to add the appropriate sentence start and end symbols; use “*” as start symbol and “STOP” as end symbol (These are defined as constants START_SYMBOL and STOP_SYMBOL in the skeleton code). You may or may not use NLTK to help you.

scramblingbalam commented 7 years ago

The code will output the log probabilities in a file “output/A1.txt”. Here’s a few examples of log probabilities of uni-, bi-, and trigrams for you to check your results: UNIGRAM captain -14.2809819899 UNIGRAM captain's -17.0883369119 UNIGRAM captaincy -19.4102650068 BIGRAM and religion -12.9316608989 BIGRAM and religious -11.3466983981 BIGRAM and religiously -13.9316608989 TRIGRAM and not a -4.02974734339 TRIGRAM and not by -4.61470984412 TRIGRAM and not come -5.61470984412