semanticvoid / dygest

Automatically exported from code.google.com/p/dygest
0 stars 0 forks source link

investigate wordnet and related api's #1

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Install the WordNet library and experiment with the local api's. Need to
look at the input and output from wordnet and what can be used for lexical
chaning.

Original issue reported on code.google.com by andy.kis...@gmail.com on 10 Jun 2009 at 5:16

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Installed wordnet + the python api (NLTK) [http://www.nltk.org/]

Original comment by andy.kis...@gmail.com on 10 Jun 2009 at 6:52

GoogleCodeExporter commented 9 years ago
documentation of the nltk workdnet api:
http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html

Original comment by andy.kis...@gmail.com on 10 Jun 2009 at 8:15

GoogleCodeExporter commented 9 years ago

Original comment by an...@semanticvoid.com on 10 Jun 2009 at 4:58

GoogleCodeExporter commented 9 years ago
Steps for lexical chain (in terms of API):

Input: set of candidate words

For each word, do:
[1] get the sysnets for the word
eg: >>> wn.synsets('person') 
[Synset('person.n.01'), Synset('person.n.02'), Synset('person.n.03')]

[2] Add the sysnets to different graphs (interpretations).

[3] For each sysnet node added, create a link to other sysnet nodes in the 
graph. Set
the weight of each link as the similarity between the two nodes.

[4] Repeat

TODO:
Need to investigate on the various similarity measures:
1. Path Similarity
2. Leacock-Chodorow Similarity
3. Wu-Palmer Similarity

Original comment by andy.kis...@gmail.com on 11 Jun 2009 at 5:48

GoogleCodeExporter commented 9 years ago
Leacock-Chodorow:
The relatedness measure proposed by Leacock and Chodorow is -log (length / (2 * 
D)),
where length is the length of the shortest path between the two synsets (using
node-counting) and D is the maximum depth of the taxonomy.

The fact that the lch measure takes into account the depth of the taxonomy in 
which
the synsets are found means that the behavior of the measure is profoundly 
affected
by the presence or absence of a unique root node. If there is a unique root 
node,
then there are only two taxonomies: one for nouns and one for verbs. All nouns, 
then,
will be in the same taxonomy and all verbs will be in the same taxonomy. D for 
the
noun taxonomy will be somewhere around 18, depending upon the version of 
WordNet, and
for verbs, it will be 14. If the root node is not being used, however, then 
there are
nine different noun taxonomies and over 560 different verb taxonomies, each 
with a
different value for D.

If the root node is not being used, then it is possible for synsets to belong 
to more
than one taxonomy. For example, the synset containing turtledove#n#2 belongs to 
two
taxonomies: one rooted at group#n#1 and one rooted at entity#n#1. In such a 
case, the
relatedness is computed by finding the LCS that results in the shortest path 
between
the synsets. The value of D, then, is the maximum depth of the taxonomy in 
which the
LCS is found. If the LCS belongs to more than one taxonomy, then the taxonomy 
with
the greatest maximum depth is selected (i.e., the largest value for D).

Wu-Palmer Similarity:
The Wu & Palmer measure calculates relatedness by considering the depths of the 
two
synsets in the WordNet taxonomies, along with the depth of the LCS. The formula 
is
score = 2*depth(lcs) / (depth(s1) + depth(s2)). This means that 0 < score <= 1. 
The
score can never be zero because the depth of the LCS is never zero (the depth 
of the
root of a taxonomy is one). The score is one if the two input synsets are the 
same.

Original comment by andy.kis...@gmail.com on 13 Jun 2009 at 7:17

GoogleCodeExporter commented 9 years ago
Java API: http://wordnet.princeton.edu/links#Java

Original comment by an...@semanticvoid.com on 14 Jun 2009 at 6:16

GoogleCodeExporter commented 9 years ago
Java API doc: http://lyle.smu.edu/~tspell/jaws/index.html

Original comment by an...@semanticvoid.com on 14 Jun 2009 at 6:20

GoogleCodeExporter commented 9 years ago
Java Wordnet similarity API: http://nlp.shef.ac.uk/result/software.html

Original comment by an...@semanticvoid.com on 18 Jun 2009 at 10:18

GoogleCodeExporter commented 9 years ago
JavSimLibrary - Wordnet Similarity Measures for JWI api -
http://laurent-mazuel.dnsalias.net/jsl/

Original comment by an...@semanticvoid.com on 20 Jun 2009 at 3:10