kevinlu1248 / pyate

PYthon Automated Term Extraction
https://kevinlu1248.github.io/pyate/
MIT License
305 stars 37 forks source link

log in C-value #15

Closed BinHeRunning closed 4 years ago

BinHeRunning commented 4 years ago

https://github.com/kevinlu1248/pyate/blob/4a61dbf42b85bf87c0f373da5b3af48747de8d23/src/pyate/cvalues.py#L71

From the original paper, log2 is used to calculate the C-value, but log is used in this code.

kevinlu1248 commented 4 years ago

Thanks for the fix and the PR at https://github.com/kevinlu1248/pyate/pull/17. Using 0 as the threshold, it wouldn't make a difference since different log bases only change the result by a constant multiplier if my calculations are correct but staying true to the algorithm might fix issues in the future.