Apply the Francis Bond's patch below and update the wn30.ttl
just to follow up on this, currently, if you exclude domains, there
are 5 entries in PWN 3.0 where are two relations, all arguably
unnecessary, and one a known bug. These are all fixed in PWN 3.1.
We will add a test for this in the open multilingual wordnet.
In three cases there is both an 'also_see' and a 'similar_to', and we
should just keep the 'similar_to'.
Synset('inattentive.a.01'):
forgetful.s.03 also_sees
forgetful.s.03 similar_tos
I attach the full list of synsets with duplicates (including domains).
P.S. Here is the script used to detect these:
from nltk.corpus import wordnet as pwn
# relations with domains
#relations = ['also_sees', 'attributes', 'causes', 'entailments',
'hypernyms', 'hyponyms', 'instance_hypernyms', 'instance_hyponyms',
'member_holonyms', 'member_meronyms', 'part_holonyms',
'part_meronyms', 'region_domains', 'similar_tos',
'substance_holonyms', 'substance_meronyms', 'topic_domains',
'usage_domains']
# relations without domains
relations = ['also_sees', 'attributes', 'causes', 'entailments',
'hypernyms', 'hyponyms', 'instance_hypernyms', 'instance_hyponyms',
'member_holonyms', 'member_meronyms', 'part_holonyms',
'part_meronyms', 'similar_tos', 'substance_holonyms',
'substance_meronyms']
for s in pwn.all_synsets():
ttt = [] # everything linked to (synset, relation)
for r in relations:
tt = getattr(s,r)()
ttt += [(t,r) for t in tt]
### check for duplicates in just synset
justt = [t for (t,r) in ttt]
if len(justt) > len(set(justt)):
print ("{}:\n{}\n\n".format(str(s),
"\n".join(["{}\t{}".format(t.name(),r)
for (t,r) in sorted(ttt)])))
just to follow up on this, currently, if you exclude domains, there are 5 entries in PWN 3.0 where are two relations, all arguably unnecessary, and one a known bug. These are all fixed in PWN 3.1. We will add a test for this in the open multilingual wordnet.
In three cases there is both an 'also_see' and a 'similar_to', and we should just keep the 'similar_to'. Synset('inattentive.a.01'): forgetful.s.03 also_sees forgetful.s.03 similar_tos
Synset('chromatic.a.03'): chestnut.s.01 also_sees chestnut.s.01 similar_tos
Synset('fertile.a.01'): conceptive.s.01 also_sees conceptive.s.01 similar_tos
In one case we have both an 'entailment' and a 'hypernym', and we should just keep the 'hypernym'.
Synset('breathe.v.01'): inhale.v.02 entailments inhale.v.02 hyponyms
And the bug: 'restrain' is both its own 'hypernym' and 'hyponym' . Synset('restrain.v.01'): inhibit.v.04 hypernyms inhibit.v.04 hyponyms
If you also allow domains, then there are quite a few more (61), e.g.
Synset('knock_on.n.01'): play.n.03 hypernyms rugby.n.01 part_holonyms rugby.n.01 topic_domains
Synset('ball_game.n.01'): baseball.n.01 hyponyms baseball.n.01 topic_domains field_game.n.01 hypernyms
Synset('bioterrorism.n.01'): terrorism.n.01 hypernyms terrorism.n.01 topic_domains
I attach the full list of synsets with duplicates (including domains).
P.S. Here is the script used to detect these: