own-pt / sensetion.el

Emacs word-sense annotation interface
GNU General Public License v3.0
4 stars 2 forks source link

master: how to annotate `no sense` #154

Open arademaker opened 5 years ago

arademaker commented 5 years ago

In the annotation of rival, I found a missing sense as adjective.

image

But the system does not allow me to mark no sense but surely an adjective. The menu with the 0 option is not displayed when there is no sense in the target PoS.

odanoburu commented 5 years ago

no sense but surely an adjective.

how would this be represented in the data?

although it definitely is a problem that you can't pick no sense like this.

arademaker commented 5 years ago

It seems to me that we have two options:

  1. make the no sense a command outside the menu of senses. Since you are right that we don't have currently a way to encode a confirmed POS tag without choosing a sense. In the data, the field pos contains the automatically assigned pos tag only, the selected senses are the confirmation (or not) of this PoS tag.

  2. change the data format to something more general, such as having the user directly or indirectly (by the selection of the senses) confirming the PoS tag. In that case, we may have an additional problem to what tagset to use. We can be conservative adding another field that would allow values such as (a, n, v, r). Change the tagset to http://moin.delph-in.net/ErgLeTypes, https://talp-upc.gitbook.io/freeling-4-0-user-manual/tagsets/tagset-en or https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.

In the glosstag, currently we have the majority of the tokens without a value for the pos field:

$ awk '$0 ~ /^[0-9]/ {print $6}' ar.data | sort | uniq -c | sort -nr
868496 _
264353 NN
174798 IN
148860 DT
143368 JJ
136720 :
69350 CC
62924 NNS
47344 NNP
42179 VBN
38710 VBG
35383 VB
34321 RB
21968 TO
20823 VBZ
16004 CD
15549 WDT
14024 )
14024 (
7859 PRP
6215 WP
5270 ,
4468 PRP$
4391 VBP
2482 VBD
2211 MD
2118 WRB
1966 JJR
1089 RP
1071 RBR
1069 JJS
 863 WP$
 363 RBS
 175 PDT
  60 SYM
   5 FW
   4 .
   2 UH
   2 ...
odanoburu commented 5 years ago
  1. might be solved by #156