In any document, the entire narrative is hinged around a few keywords. However, although the narrative is about these keywords, these words do not actually appear in the text at a high frequency.
Therefore, it is wrong to assume that all the words with the highest count are automatically the keywords in the document.
For example, here is the keyword list for the 721-page compendium.
We already know that the entire document is all about EC, EIA and Notifications; and the agencies are called MoEF, SEAC and SEIAA. Therefore listing those words as keywords does not serve any purpose: They will keep occurring throughout the document, and the software will not be able to add any value by highlighting all occurrences.
Desired behavior:
It is OK to start with a few words with the highest count, but let the user edit the list and specify his own keywords.
Alternatively, display the word cloud separately, and let the user define the keywords manually.
(It is difficult to guess the keywords in any article using text analysis.
Filing this for later; I'm getting a bit overwhelmed here, so primary focus will be PDF/OCR process. Then there's the big fat trouble with Google Scholar in the sniffer, which, to me, is priority number 2.
In any document, the entire narrative is hinged around a few keywords. However, although the narrative is about these keywords, these words do not actually appear in the text at a high frequency.
Therefore, it is wrong to assume that all the words with the highest count are automatically the keywords in the document.
For example, here is the keyword list for the 721-page compendium.
We already know that the entire document is all about EC, EIA and Notifications; and the agencies are called MoEF, SEAC and SEIAA. Therefore listing those words as keywords does not serve any purpose: They will keep occurring throughout the document, and the software will not be able to add any value by highlighting all occurrences.
Desired behavior: It is OK to start with a few words with the highest count, but let the user edit the list and specify his own keywords.
Alternatively, display the word cloud separately, and let the user define the keywords manually. (It is difficult to guess the keywords in any article using text analysis.