own-pt / glosstag

Semantically Tagged PWN glosses
Other
7 stars 4 forks source link

Error in original data causes information loss #17

Closed hmuniz closed 4 years ago

hmuniz commented 4 years ago
<synset id="r00003483" ofs="00003483" pos="r">
  <terms>
   <term>basically</term>
   <term>fundamentally</term>
   <term>essentially</term>
  </terms>
  <keys>
   <sk>basically%4:02:00::</sk>
   <sk>fundamentally%4:02:00::</sk>
   <sk>essentially%4:02:01::</sk>
  </keys>
  <gloss desc="orig">
   <orig>in essence; at bottom or by one's (or its) very nature; "He is basically dishonest"; "the argument was essentially a technical one"; "for all his bluster he is in essence a shy person"</orig>
  </gloss>
  <gloss desc="text">
   <text>in essence ; at bottom or by one's ( or its ) very nature ; “ He is basically dishonest ” ; “ the argument was essentially a technical one ” ; “ for all his bluster he is in essence a shy person ”</text>
  </gloss>
  <gloss desc="wsd">
   <def id="r00003483_d">
    <wf id="r00003483_wf1" lemma="in" pos="IN" tag="ignore">in</wf>
    <wf id="r00003483_wf2" lemma="essence%1" pos="NN" sep="" tag="un">essence</wf>
    <wf id="r00003483_wf3" pos=":" tag="ignore" type="punc">;</wf>
    <cf coll="a" id="r00003483_wf4" lemma="at" pos="IN" tag="ignore">
     <glob coll="a" glob="man" id="r00003483_coll.a" lemma="at_bottom%4" tag="man">
      <id coll="b" id="r00003483_id.6" lemma="at bottom" sk="at_bottom%4:02:00::"/>
    </glob>at</cf>
    <cf coll="a" id="r00003483_wf5" lemma="bottom%1|bottom%2|bottom%3" pos="NN" tag="un">bottom</cf>
    <wf id="r00003483_wf6" lemma="or" pos="CC" tag="ignore">or</wf>
    <wf id="r00003483_wf7" lemma="by" pos="IN" tag="ignore">by</wf>
....
(:ofs "00003483" :pos "r" :keys (("essentially%4:02:01::" . "essentially")
                 ("fundamentally%4:02:00::" . "fundamentally")
                 ("basically%4:02:00::" . "basically"))
      :gloss "in essence; at bottom or by one's (or its) very nature; 
         \"He is basically dishonest\"; \"the argument was essentially a technical one\";  
        \"for all his bluster he is in essence a shy person\""
      :tokens ((:kind :def :action :open)
           (:kind :wf :form "in" :lemma "in" :pos "IN" :tag "ignore")
           (:kind :wf :form "essence" :lemma "essence%1" :pos "NN" :tag "un" :sep "")
           (:kind :wf :form ";" :pos ":" :tag "ignore" :type "punc")
           (:kind (:glob . "a") :lemma "at_bottom%4" :tag "man" :glob "man")
           (:kind (:cf "a") :form "at" :lemma "at" :pos "IN" :tag "ignore")
           (:kind (:cf "a") :form "bottom" :lemma "bottom%1|bottom%2|bottom%3" 
                   :pos "NN" :tag "un")
           (:kind :wf :form "or" :lemma "or" :pos "CC" :tag "ignore")
           (:kind :wf :form "by" :lemma "by" :pos "IN" :tag "ignore")
...

The glob in plist version is not annotatted, is that right? Is it a bug in conversion?

arademaker commented 4 years ago

This can be related to the problem with spaces vs underscore? at_bottom vs at bottom?

arademaker commented 4 years ago

@hmuniz isolated the issue. It seems to be related to the mismatch (in the XML original files) between the ID/@coll inside the glob/@coll

arademaker commented 4 years ago

solved in 5525ae3