nert-nlp / pastrie

PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English
Creative Commons Attribution Share Alike 4.0 International
5 stars 1 forks source link

Prepositional supersense annotations on non-preposition targets #5

Closed lgessler closed 2 years ago

lgessler commented 3 years ago

Is it OK for a verb-headed SMWE to have a prepositional supersense? The validator complains about it. Offending SMWE:

21  give    give    VERB    VB  _   10  conj    _   _   2:1 _   give up on  p.Theme p.Theme _   _   _   _
22  up  up  ADP RP  _   21  compound:prt    _   _   2:2 _   _   _   _   _   _   _   _
23  on  on  ADP IN  _   24  case    _   _   2:3 _   _   _   _   _   _   _   _
nschneid commented 3 years ago

No, and per the policy on prepositional verbs I think "on" should be weakly attached to "give up" (so, give_up~on).

lgessler commented 3 years ago

A search with the following criteria:

token['ss'].startswith('p.')
and not (token['upos'] in ['ADP', 'ADV', 'SCONJ'] or token['xpos'] in ['IN', 'PRP$', 'POS', 'RB'])
and token['lemma'] not in ['to']
and token['smwe'] != '_'

yields these:

L2402 (spanish-022ee17b-a59d-43d6-65c9-90b2acb26b87-11, #21): prepositional SS on non-ADP token
L2756 (spanish-022ee17b-a59d-43d6-65c9-90b2acb26b87-19, #35): prepositional SS on non-ADP token
L6099 (spanish-1d76d7a0-219d-c07b-1abf-286b47e0643b-01, #17): prepositional SS on non-ADP token
L6141 (spanish-1d76d7a0-219d-c07b-1abf-286b47e0643b-02, #17): prepositional SS on non-ADP token
L7181 (english-9501e917-496d-c27b-7af2-f402a0e624f1-05, #24): prepositional SS on non-ADP token
L7808 (german-fe0717e9-61b4-b18e-7fde-174c299e3274-01, #5): prepositional SS on non-ADP token
L9016 (german-b7b4ea19-f3c6-0ca9-f523-ae25f9908c04-03, #5): prepositional SS on non-ADP token
L10011 (french-6e9d7077-dca1-284f-82f0-1b582c61cbec-01, #5): prepositional SS on non-ADP token
L11727 (spanish-c186d70f-c53c-a2cb-1b83-fd31009b829f-01, #54): prepositional SS on non-ADP token
L12701 (german-6ca8e3b9-d1f6-8dbe-3773-189e87c3356d-03, #1): prepositional SS on non-ADP token
L15867 (german-c2b1c26b-e814-7dc9-af47-9f2936631b3e-06, #25): prepositional SS on non-ADP token
L16909 (english-a5e3a719-6bf5-2dfd-54a8-762b2b12c92c-08, #24): prepositional SS on non-ADP token
L17119 (german-2b68abef-41db-d351-83a0-614fb7222606-03, #21): prepositional SS on non-ADP token
L18605 (english-908e0372-bcbe-9a42-f89d-3fda1e454241-02, #8): prepositional SS on non-ADP token
L23148 (german-a5b56ce5-e05b-d41a-155e-d4cfb0ff376c-04, #34): prepositional SS on non-ADP token
L25292 (german-822a6610-17cb-7e18-eba5-a923d7386d6c-03, #3): prepositional SS on non-ADP token

I'm not sure how many of them actually violate guidelines, but they seem similar to the first example I pointed out. Some examples:

1   I   I   PRON    PRP _   3   nsubj   _   _   _   _   _   _   _   _   _   _   _
2   just    just    ADV RB  _   3   advmod  _   _   _   _   _   _   _   _   _   _   _
3   unlocked    unlock  VERB    VBD _   0   root    _   _   _   _   _   _   _   _   _   _   _
4   this    this    DET DT  _   5   det _   _   _   _   _   _   _   _   _   _   _
5   thanks  thanks  NOUN    NN  _   3   obj _   _   1:1 _   thanks to   Explanation Explanation _   _   _   _
6   to  to  ADP IN  _   10  case    _   _   1:2 _   _   _   _   _   _   _   _
7   the the DET DT  _   10  det _   _   _   _   _   _   _   _   _   _   _
8   Greed   greed   NOUN    NN  _   9   compound    _   _   _   _   _   _   _   _   _   _   _
9   donation    donation    NOUN    NN  _   10  compound    _   _   _   _   _   _   _   _   _   _   _
10  machine machine NOUN    NN  _   5   nmod    _   _   _   _   _   _   _   _   _   _   _
11  .   .   PUNCT   .   _   3   punct   _   _   _   _   _   _   _   _   _   _   _
5   took    take    VERB    VBD _   2   conj    _   _   1:1 _   take part in    Circumstance    Locus   _   _   _   _
6   part    part    NOUN    NN  _   5   obj _   _   1:2 _   _   _   _   _   _   _   _
7   in  in  ADP IN  _   10  case    _   _   1:3 _   _   _   _   _   _   _   _
8   a   a   DET DT  _   10  det _   _   _   _   _   _   _   _   _   _   _
9   civil   civil   ADJ JJ  _   10  amod    _   _   _   _   _   _   _   _   _   _   _
10  war war NOUN    NN  _   5   obl _   _   _   _   _   _   _   _   _   _   _
5   I   I   PRON    PRP _   7   nsubj   _   _   _   _   _   _   _   _   _   _   _
6   ve  have    AUX VBP _   7   aux _   _   _   _   _   _   _   _   _   _   _
7   found   find    VERB    VBN _   0   root    _   _   _   _   _   _   _   _   _   _   _
8   nothing nothing PRON    NN  _   7   obj _   _   1:1 _   nothing but PartPortion PartPortion _   _   _   _
9   but but CCONJ   CC  _   13  cc  _   _   1:2 _   _   _   _   _   _   _   _
10  the the DET DT  _   13  det _   _   _   _   _   _   _   _   _   _   _
5   a   a   DET DT  _   6   det _   _   1:1 _   a lot of    QuantityItem    QuantityItem    _   _   _   _
6   lot lot NOUN    NN  _   4   obj _   _   1:2 _   _   _   _   _   _   _   _
7   of  of  ADP IN  _   8   case    _   _   1:3 _   _   _   _   _   _   _   _
21  wasting waste   VERB    VBG _   3   conj    _   _   _   _   _   _   _   _   _   _   _
22  your    you PRON    PRP$    _   23  nmod:poss   _   _   _   _   _   Gestalt Gestalt _   _   _   _
23  time    time    NOUN    NN  _   21  obj _   _   _   _   _   _   _   _   _   _   _
24  seeing  see VERB    VBG _   21  advcl   _   _   1:1 _   see as  Explanation Explanation _   _   _   _
25  as  as  SCONJ   IN  _   27  mark    _   _   1:2 _   _   _   _   _   _   _   _
16  but but CCONJ   CC  _   19  cc  _   _   _   _   _   _   _   _   _   _   _
17  according   accord  VERB    VBG _   19  case    _   _   1:1 _   accord to   Circumstance    Circumstance    _   _   _   _
18  to  to  ADP IN  _   17  fixed   _   _   1:2 _   _   _   _   _   _   _   _
19  distance    distance    NOUN    NN  _   14  obl _   _   _   _   _   _   _   _   _   _   _
nschneid commented 3 years ago

These are multiword expressions interpreted as adpositional. So they should have P for the lexcat. The lexcat, not the UPOS directly, is what determines which supersenses are valid.

lgessler commented 3 years ago

is that also true for took part?

nschneid commented 3 years ago

No that should be took_part~in

On Tue, Sep 7, 2021, 2:27 PM Luke Gessler @.***> wrote:

is that also true for took part?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nert-nlp/pastrie/issues/5#issuecomment-914526496, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHQRL2OAQXSGEN3Q5GKB3TUAZKSNANCNFSM5DNSNL3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.