nasa-jpl-memex / sce

Sparkler Crawl Environment - a packaged, dockerized version of http://github.com/USCDataScience/sparkler.git
http://irds.usc.edu/sparkler/
Apache License 2.0
4 stars 3 forks source link

Documentation on how to construct a keyword list needs to be added to wiki or somewhere accessible #39

Open rduerr opened 5 years ago

rduerr commented 5 years ago

Users expecting to use the SCE for a particular purpose may need to regenerate a model from scratch during an initial exploration phase. For example when Ketil reviewed the 600 URL's he ended up having to define for himself detailed descriptions of how to score a document. I also have had to do that, as my initial guess for what was the "right stuff" turned out to be inadequate (not specific enough). My second pass at a rule set is:

+ Green rules:

A document that defines permafrost-related terminology that also likely contains information about the history of the term. Examples: Permafrost related review papers; comprehensive dictionaries; wikipedia articles with historical discussions of terms, etc.

! Orange rules: 

Documents that are about permafrost-related terms but which do not have historical information. Newspaper articles, etc.

- Red rules: 

Documents using permafrost terms but which are about businesses, games, etc. Totally irrelevant stuff.

NOTE: Anyone have a better way of colorizing markdown text?

rduerr commented 5 years ago

@wmburke I think this one is yours..... but I can't assign you or anyone!