Closed sleepycat closed 7 years ago
A little further research. There is a python implementation that looks like they got it. With a little formatting for clarity:
mike@bullseye:~/projects/cloned/RAKE$ python rake.py
[
('minimal generating sets', 8.666666666666666),
('linear diophantine equations', 8.5),
('minimal supporting set', 7.666666666666666),
('minimal set', 4.666666666666666),
('linear constraints', 4.5),
('upper bounds', 4.0),
('natural numbers', 4.0),
('nonstrict inequations', 4.0)
]
[
('minimal generating sets', 8.666666666666666),
('linear diophantine equations', 8.5),
('minimal supporting set', 7.666666666666666),
('minimal set', 4.666666666666666),
('linear constraints', 4.5),
('upper bounds', 4.0),
('natural numbers', 4.0),
('nonstrict inequations', 4.0),
('strict inequations', 4.0),
('mixed types', 3.666666666666667),
('considered types', 3.166666666666667),
('set', 2.0), ('types', 1.6666666666666667),
('considered', 1.5),
('constructing', 1.0),
('solutions', 1.0),
('solving', 1.0),
('system', 1.0),
('compatibility', 1.0),
('systems', 1.0),
('criteria', 1.0),
('construction', 1.0),
('algorithms', 1.0),
('components', 1.0)
]
The entire problem is only with the regex that's been used in generatePhrases
. I just quickly wrote it and released the library. I was about to look into that regex. It tops my priority list now!
By the way, thanks for the research! :+1: :)
Hey @waseem18! In one of the tests the text includes "'committal, theory" and I noticed that
node-rake
listed one of the keywords as "'committal theory". This makes me think there is something not quite right in the algorithm. Maybe a regex needs some adjustment.To dig a little deeper I looked at the paper (I haven't read the whole thing yet) but it actually has some example text and lists the output pages 161-162. I thought that would make a good test:
This test is currently failing with the following output:
Any thoughts on what could be causing such a difference?