Closed arademaker closed 3 years ago
I checked our files, but didn't find any duplicated triple. Even in the old files morphosemantic-links-en.nt.gz
, morphosemantic-links-pt.nt.gz
, wordnet-en.nt.gz
and own-pt.nt.gz
, at df754c2e4ee72127553147f16d0d2fedd6b0a9fb, taking advantage of the format ntriples to check
cat unzipped/* | sort | uniq -c | egrep " [2-9]" > duplicated.nt
I only found some duplicated Word type definitions, not present in the new files at the commit f285c4a5b48b3150188ae1dcb25d0000eabcd06b. For instance:
[...]
<https://w3id.org/own-pt/wn30-pt/instances/word-Reino_das_Ilhas_de_Antígua_e_Barbuda> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/own-pt/wn30/schema/Word> .
<https://w3id.org/own-pt/wn30-pt/instances/word-relação_social> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/own-pt/wn30/schema/Word> .
<https://w3id.org/own-pt/wn30-pt/instances/word-separar_se> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/own-pt/wn30/schema/Word> .
<https://w3id.org/own-pt/wn30-pt/instances/word-unir_se> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/own-pt/wn30/schema/Word> .
It suppose someone took care of it during the parsing from WNDB files.
This affect us ? How ?
https://github.com/globalwordnet/english-wordnet/issues/436