own-pt / openWordnet-PT

OpenWordnet-PT: an open access wordnet for Portuguese
http://openwordnet-pt.org
Other
154 stars 35 forks source link

Duplicated links #184

Closed arademaker closed 3 years ago

arademaker commented 3 years ago

This affect us ? How ?

https://github.com/globalwordnet/english-wordnet/issues/436

fredsonaguiar commented 3 years ago

I checked our files, but didn't find any duplicated triple. Even in the old files morphosemantic-links-en.nt.gz, morphosemantic-links-pt.nt.gz, wordnet-en.nt.gz and own-pt.nt.gz, at df754c2e4ee72127553147f16d0d2fedd6b0a9fb, taking advantage of the format ntriples to check

cat unzipped/* | sort | uniq -c | egrep "      [2-9]" > duplicated.nt

I only found some duplicated Word type definitions, not present in the new files at the commit f285c4a5b48b3150188ae1dcb25d0000eabcd06b. For instance:

[...]
<https://w3id.org/own-pt/wn30-pt/instances/word-Reino_das_Ilhas_de_Antígua_e_Barbuda> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/own-pt/wn30/schema/Word> .
<https://w3id.org/own-pt/wn30-pt/instances/word-relação_social> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/own-pt/wn30/schema/Word> .
<https://w3id.org/own-pt/wn30-pt/instances/word-separar_se> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/own-pt/wn30/schema/Word> .
<https://w3id.org/own-pt/wn30-pt/instances/word-unir_se> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/own-pt/wn30/schema/Word> .

It suppose someone took care of it during the parsing from WNDB files.