write script to extract roots

GoogleCodeExporter commented 9 years ago

Purpose of addition of this task:
to automate search on the text based on Inuktitut roots

When reviewing task, please focus on:
- groovy script works in GATE
- colour code words with a common root 
- exclude words that are most likely functional (short words with <2 letters) 

After the review, please add a xxx to yyy wiki page.

After the review, the expected next step is to:
- figure out the equivalent of & (the lateral fricative) in Romanized Inuktitut 
in InukMagazine

Original issue reported on code.google.com by hisako...@gmail.com on 9 Nov 2011 at 11:02

GoogleCodeExporter commented 9 years ago

Wrote a groovy script to extract all roots (except roots with less than three 
letters because they are most likely functional words, and roots with "&" (the 
lateral fricative) because InukMagazine has different writing conventions for 
Romanized Inuktitut) and to write a jape grammar. The source (list of roots) is 
taken from Inuktitut Computing, converted from html to txt.  
http://www.inuktitutcomputing.ca/DataBase/en/index.html  

The jape grammar contains two types of rules: 1) extract words with roots; 2) 
extract words with a common root. 

Run in GATE, found ~30% of words per paragraph. Colour coded only 
LexicographyKnown (Rule type 1), not by each root.  

Please investigate why colour codes only LexicographyKnown.

Original comment by hisako...@gmail.com on 9 Nov 2011 at 11:22

GoogleCodeExporter commented 9 years ago

Original comment by a...@ilanguage.ca on 10 Nov 2011 at 12:35

Changed state: Started

GoogleCodeExporter commented 9 years ago

Original comment by a...@ilanguage.ca on 10 Nov 2011 at 12:36

Added labels: Milestone-InuktitutCorpus, Priority-Medium

GoogleCodeExporter commented 9 years ago

Original comment by a...@ilanguage.ca on 25 Nov 2011 at 10:26

Added labels: Type-Implementation

iLanguage / ilanguagelab

write script to extract roots #15