Added parameters with default values. NOTE: I will need to add these to text_summarizer.py later when no one else is touching the file. ALSO NOTE: I will change the defaults for these parameter values after tuning on training data
Added doc string explaining what each of the parameters is
Changed the functions it calls to pass in the necessary parameters for each
For _build_sim_matrix():
Fixed bug where sentences were being added for cosine similarity without a threshold check
Added ability to use cosine similarity OR normalized generative probability, which adds only the top K nodes' similarities
For _build_bias_vec():
Added ability to use relevance OR generative probability OR cosine similarity for the bias weighting.
New formulas functions:
_calc_relevance(): calculates bias relevance using formula from Otterbacher (2005)
_calc_smoothed_mle(), _calc_gen_prob(), _calc_norm_gen_prob(): formulas used to calculate generative & normalized generative probabilities from Otterbacher (2009)
Other Notes:
_cosine_similarity() now runs on the raw_counts version of tf-idf, which does not change its output at all (cosine normalizes, so the previously normalized tf-idf didn't run any differently).
For
select_content()
:For
_build_sim_matrix()
:For
_build_bias_vec()
:New formulas functions:
_calc_relevance()
: calculates bias relevance using formula from Otterbacher (2005)_calc_smoothed_mle()
,_calc_gen_prob()
,_calc_norm_gen_prob()
: formulas used to calculate generative & normalized generative probabilities from Otterbacher (2009)Other Notes:
_cosine_similarity()
now runs on the raw_counts version of tf-idf, which does not change its output at all (cosine normalizes, so the previously normalized tf-idf didn't run any differently).