ixa-ehu / ixa-pipe-pos

IXA pipes Part of Speech tagger and Lemmatizer (http://ixa2.si.ehu.es/ixa-pipes)
Apache License 2.0
17 stars 15 forks source link

Would be posible to include "case" information in the ixa-pipe-pos output? #3

Open Maddalen opened 8 years ago

Maddalen commented 8 years ago

For the NAF format could be added in the already used 'morphofeat' property or also including the 'case' property.

For example, the output for the Basque word "neurrien" would be the following:

<NAF xml:lang="eu" version="v1.naf">
  <nafHeader>
    <linguisticProcessors layer="text">
      <lp name="ixa-pipe-tok-eu" beginTimestamp="2016-02-29T12:41:48+0100" endTimestamp="2016-02-29T12:41:48+0100" version="1.8.4-a8477645be9385838f746c3650f30f3cc24e3cb3" hostname="maddalen-OptiPlex-780" />
    </linguisticProcessors>
    <linguisticProcessors layer="terms">
      <lp name="ixa-pipe-pos-eu-pos-perceptron-ud" beginTimestamp="2016-02-29T12:41:50+0100" endTimestamp="2016-02-29T12:41:50+0100" version="1.5.0-98d47e55e2212248b89088eece43f149e32be30e" hostname="maddalen-OptiPlex-780" />
    </linguisticProcessors>
  </nafHeader>
  <text>
    <wf id="w1" offset="0" length="8" sent="1" para="1">neurrien</wf>
  </text>
  <terms>
    <!--neurrien-->
    <term id="t1" type="open" lemma="neurri" pos="N" morphofeat="NC0GP000" case="IZE ARR BIZ- GEN NUMP MUGM ZERO @&lt;IZLG @IZLG&gt;">
      <span>
        <target id="w1" />
      </span>
    </term>
  </terms>
</NAF>

Thanks in advance,

Maddalen

ragerri commented 8 years ago

Hi!

Do you need all that stuff: IZE ARR BIZ- GEN NUMP MUGM ZERO @<IZLG @IZLG> or IZE ARR GEN is enough? (noun common genitive, noun common locative, etc.) Futhermore, what is the meaning of BIZ- in that tag? Also, where is the NC0GP000 coming from?

Maddalen commented 8 years ago

Hi,

first of all thanks for answering so fast.

Actually, no, I don't need all that stuff. The following information would be enough for me:

category or part of speech. In the example : IZE ARR number (mugatasuna): s , p or mg declension (deklinabide kasua) : In the example : GEN

The example I used is what ixa-pipe-eustagger outputs for the word "neurrien". I think BIZ- tag means "bizigabea". The NC0GP000 I don't know where is coming from.