janastu / IIHS-TNUSSP-feed

This repository contains all the info regarding use cases and tests performed on newsrack.in for iihs.
0 stars 0 forks source link

Chennai team: Version management of concepts, category and taxonomy #25

Open salus-sage opened 7 years ago

salus-sage commented 7 years ago

New version of concepts tnussp_tamil_robinr in iihs_chennai user created. https://docs.google.com/document/d/1OUYh5V6SzyZZMYiZf4daDjTYPu6oFjLEdyAypNBwzwo/edit

salus-sage commented 7 years ago

I have fixed the issue. i would like to explain the error so that if you encounter in future you will know to debug. [File tnussp_robinrE: Line Number: 12]: ERROR: Encountered token with
[File tnussp_robinrE: Line Number: 13]: A concept has to be defined as: <concept> = keyword1, keyword2, .. , keywordn
ERROR parsing file tnussp_robinrE: java.lang.ArrayStoreException

ERROR: Encountered token with looks like newsrack doesn't like the phrase you have added in your category containing the word "with" i removed and it works now.

Like i said earlier, Please have a look at the user guide, and i would like to quote a particular example for this situation,

6.1 Proximity operator (~n) Sometimes, two words/phrases you want to match might occur in different combinations. For example, interlinking of rivers can occur in text in the following forms: "river linking", "linking rivers", "linking of rivers", "linking of many rivers", "river inter-linking", "interlinking of rivers", and possibly a few others. One way to do this is to define a concept that anticipates all these different phrase forms and records them in a single concept as follows: define concepts

= river linking, linking rivers, linking of rivers, river interlinking, interlinking of rivers end def topic Interlinking = filter {rss.feeds} with interlinking While this will work, this is both cumbersome and might not necessarily capture everything. Another way to handle this situation is to use the proximity operator. For example, consider the following example: def concepts = river = interlinking, linking end def topic Interlinking = filter {rss.feeds} with (river ~2 interlinking) The filtering rule: river ~2 interlinking will trigger whenever the concepts river and interlinking are separated by at most two words. This is more robust. One more example. Consider the concepts: = president, "mr." = obama, obaama With these concepts, the filtering rule (president ~1 obama) tries to match concepts president and obama separated by at most 1 word. So, this will match "President Barack Obama" as well as "Mr. Obama" as well as "Mr. Barack Hussein Obaama". So, more generally, the filtering rule a ~N b will look for concepts 'a' and 'b' in text separated by at most N words.