Open GoogleCodeExporter opened 9 years ago
[deleted comment]
Installed wordnet + the python api (NLTK) [http://www.nltk.org/]
Original comment by andy.kis...@gmail.com
on 10 Jun 2009 at 6:52
documentation of the nltk workdnet api:
http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html
Original comment by andy.kis...@gmail.com
on 10 Jun 2009 at 8:15
Original comment by an...@semanticvoid.com
on 10 Jun 2009 at 4:58
Steps for lexical chain (in terms of API):
Input: set of candidate words
For each word, do:
[1] get the sysnets for the word
eg: >>> wn.synsets('person')
[Synset('person.n.01'), Synset('person.n.02'), Synset('person.n.03')]
[2] Add the sysnets to different graphs (interpretations).
[3] For each sysnet node added, create a link to other sysnet nodes in the
graph. Set
the weight of each link as the similarity between the two nodes.
[4] Repeat
TODO:
Need to investigate on the various similarity measures:
1. Path Similarity
2. Leacock-Chodorow Similarity
3. Wu-Palmer Similarity
Original comment by andy.kis...@gmail.com
on 11 Jun 2009 at 5:48
Leacock-Chodorow:
The relatedness measure proposed by Leacock and Chodorow is -log (length / (2 *
D)),
where length is the length of the shortest path between the two synsets (using
node-counting) and D is the maximum depth of the taxonomy.
The fact that the lch measure takes into account the depth of the taxonomy in
which
the synsets are found means that the behavior of the measure is profoundly
affected
by the presence or absence of a unique root node. If there is a unique root
node,
then there are only two taxonomies: one for nouns and one for verbs. All nouns,
then,
will be in the same taxonomy and all verbs will be in the same taxonomy. D for
the
noun taxonomy will be somewhere around 18, depending upon the version of
WordNet, and
for verbs, it will be 14. If the root node is not being used, however, then
there are
nine different noun taxonomies and over 560 different verb taxonomies, each
with a
different value for D.
If the root node is not being used, then it is possible for synsets to belong
to more
than one taxonomy. For example, the synset containing turtledove#n#2 belongs to
two
taxonomies: one rooted at group#n#1 and one rooted at entity#n#1. In such a
case, the
relatedness is computed by finding the LCS that results in the shortest path
between
the synsets. The value of D, then, is the maximum depth of the taxonomy in
which the
LCS is found. If the LCS belongs to more than one taxonomy, then the taxonomy
with
the greatest maximum depth is selected (i.e., the largest value for D).
Wu-Palmer Similarity:
The Wu & Palmer measure calculates relatedness by considering the depths of the
two
synsets in the WordNet taxonomies, along with the depth of the LCS. The formula
is
score = 2*depth(lcs) / (depth(s1) + depth(s2)). This means that 0 < score <= 1.
The
score can never be zero because the depth of the LCS is never zero (the depth
of the
root of a taxonomy is one). The score is one if the two input synsets are the
same.
Original comment by andy.kis...@gmail.com
on 13 Jun 2009 at 7:17
Java API: http://wordnet.princeton.edu/links#Java
Original comment by an...@semanticvoid.com
on 14 Jun 2009 at 6:16
Java API doc: http://lyle.smu.edu/~tspell/jaws/index.html
Original comment by an...@semanticvoid.com
on 14 Jun 2009 at 6:20
Java Wordnet similarity API: http://nlp.shef.ac.uk/result/software.html
Original comment by an...@semanticvoid.com
on 18 Jun 2009 at 10:18
JavSimLibrary - Wordnet Similarity Measures for JWI api -
http://laurent-mazuel.dnsalias.net/jsl/
Original comment by an...@semanticvoid.com
on 20 Jun 2009 at 3:10
Original issue reported on code.google.com by
andy.kis...@gmail.com
on 10 Jun 2009 at 5:16