msrb / cvejob

!!! MOVED TO https://github.com/fabric8-analytics/cvejob
Apache License 2.0
1 stars 4 forks source link

Choosing better stopwords #3

Closed abs51295 closed 6 years ago

abs51295 commented 6 years ago

I see that following are the stop-words that are currently being used. I would suggest following enhancements:

  1. Use the standard stop-words from nltk-corpus (stopwords.words('english')).
  2. Also add types of vulnerability as stop words since they cannot be the name of packages.

@msrb WDYT?

msrb commented 6 years ago

+1, great idea @abs51295, thanks :)

abs51295 commented 6 years ago

Cool sending a PR ;)

abs51295 commented 6 years ago

Oops @msrb, unfortunately, there are packages with vulnerability names. I just gave an example but there are many more. So not adding them as of now :)

msrb commented 6 years ago

there are packages with vulnerability names

Good catch @abs51295 :)