Open nilaykumar opened 1 year ago
Hi Nilay, thanks for your interest in my work!
Yes, you are correct that the log of a probability should always be non-positive! Unfortunately, I did not create the model and had data mined it, so I can't speak to reason why some of the values are positive.
This project was simply for the benefit of my own Japanese study with the intent of rating the reading difficulty of ebooks. However, if you have any improvements to what I've started here feel free to let me know!
Hey Scott --
First of all: thanks for your work on this port. The web version of the tool seems to be down, so this repo was actually the only place I could find code/data files for Sato et al's project. This isn't really a question about the code, so I apologize if this is a bit off-topic. I just figured you might have an idea of how the computation is set up.
In the
calculate_likelihoods
function ofnagoyaobi.py
the likelihood for a given characterkey
of the input text being in the ith grade level is computed astext[key][i] = text[key][0] * self.model[key][i]
. This is in line with the the summand in equation (8) of Sato-Matsuyoshi-Kondoh's paper, so no problems there. The second factor in this product should be $\log P(z \mid G_i)$, which, being the log of a probability, should be non-positive. The values in theself.model
dictionary, however, are pulled fromObi2-T13.model
, which has a number of positive values. So it's not clear to me that the code (even in the ruby code) is doing exactly what is described in the paper. Am I misunderstanding how the computation's supposed to go?