waseem18 / node-rake

A NodeJS implementation of the Rapid Automatic Keyword Extraction algorithm.
http://www.thabraze.me/node-rake/
MIT License
100 stars 20 forks source link

Trim leading/trailing spaces from phrases #5

Closed sleepycat closed 7 years ago

sleepycat commented 7 years ago

This looked like a minor change but it has a significant impact on the results returned:

Before:

    [ ' popular topic models ',
      ' Latent Dirichlet Allocation',
      'LDA stands ',
      ' initially proposed ',
      ' Blei Ng ',
      ' generative model ',
      ' unobserved groups ',
      ' 2003',
      ' mentioned ',
      ' similar',
      ' sets ',
      ' observations ',
      ' explained ',
      ' Jordan ',
      ' explain ',
      ' parts ',
      ' data ',
      ' Wikipedia ' ]

After:

    [ 'Latent Dirichlet Allocation',
      'popular topic models',
      'initially proposed',
      'Blei Ng',
      'generative model',
      'LDA stands',
      'unobserved groups',
      'Wikipedia',
      'mentioned',
      '2003',
      'sets',
      'observations',
      'explained',
      'Jordan',
      'explain',
      'parts',
      'data',
      'similar' ]

To my eye this is a little more inline with my expectations given the test text, but I don't actually have any knowledge of the underlying algorithm yet, so it's just as likely that this change has broken things.

If you feel this doesn't make sense based on your understanding of the algorithm let me know if there is a better approach. I'm happy to make changes.

waseem18 commented 7 years ago

This is really good @sleepycat :+1: