Open GoogleCodeExporter opened 9 years ago
This might be related to issue 36.
Original comment by eleonor...@gmx.net
on 30 Aug 2011 at 1:10
You are definitely right. What we need is an additional category like 'PTKVZ'
in STTS.
Original comment by wuerz...@gmail.com
on 30 Aug 2011 at 3:24
I modified the .fst files in branches/kmw to include the inflection class
"Ptkl-Vz". Separable verb prefixes may now be defined as
<BaseStem>
<Lemma>auf</Lemma>
<Stem>auf</Stem>
<Pos>OTHER</Pos>
<Origin>nativ</Origin>
<InfClass>Ptkl-Vz</InfClass>
</BaseStem>
Which should result in auf<+PTKL><Vz> (where Vz denotes Verbzusatz as in STTS).
Original comment by wuerz...@gmail.com
on 31 Aug 2011 at 9:55
Issue 36 has been merged into this issue.
Original comment by wuerz...@gmail.com
on 31 Aug 2011 at 9:56
Since we can not distinguish separable verb prefixes in prefixes.xml by now, I
prefer this solution (i.e., to hard code them in e.g. others.xml) against a
"null morph conversion" in suff_stems.xml.
Original comment by wuerz...@gmail.com
on 31 Aug 2011 at 9:59
what do the classes Pref/X (Pref/V, Pref/Adj and others) in others.xml signify?
Isn't the current <+VPRE> synonymous with PTKVZ?
Original comment by rico.sen...@googlemail.com
on 31 Aug 2011 at 1:32
Possibly. But there is only one entry with
"<InfClass>Pref/V</InfClass>" which is "lieben" in others.xml. This
would not be "PTKVZ" according to the STTS standards. Although this
might be an erroneous entry. What do you suggest?
Original comment by wuerz...@gmail.com
on 31 Aug 2011 at 1:41
I think "lieben" is an error, and Pref/X is intended to signifiy verb particles:
> ab
ab<+PREP><Dat>
ab<+VPRE>
It is true that there are only four entries in others.xml, two of them probably
wrong. I don't care about whether to use <+VPRE> or <+PTKVZ>, as long as it's
consistent.
Personally, I'd automatically generate the analysis <+VPRE> for all PrefStems
in prefixes.xml that have the tag <Pos>V</Pos>. This generates less overhead
than having to duplicate all 200 entries (you could do the initial duplication
automatically, but this would create extra work whenever you want to modify/add
a prefix).
Automatically analysing all verb prefixes in prefixes.xml as verb particles
will generate a few false positives, like "ver" and "be". If you want to get
rid of them, one could introduce a new tag to distinguish between separable and
non-separable prefixes (or set "<InfClass>" for all separable ones, for
instance).
This solution requires a bit of knowledge about the transducer though, and I
don't currently know how to best implement this.
Original comment by rico.sen...@googlemail.com
on 31 Aug 2011 at 2:09
I only greped for "Pef/V". There is also "Pref/Sep". Sorry for missing that
one. I agree with your proposal. It might even be possible to implement that
via a "SuffStem" without touching the transducer. I will look into that.
Original comment by wuerz...@gmail.com
on 31 Aug 2011 at 2:21
Here is my proposal: Separable and non-separable prefixes can be distinguished
by the way the past participle of the corresponding verbs is made up (i.e.
"verschafft" vs. "angeschafft"). This is implemented through the feature
"<no-ge>" in morphisto (cf. deko.fst). The attached file filters the prefixes
for this feature and creates an analysis as "<+VPRE>". If you like that
solution, I will integrate it. Fell free to modify (simplify) the transducer.
Original comment by wuerz...@gmail.com
on 8 Sep 2011 at 8:28
Attachments:
looks great! I'm all for integrating it into the trunk.
Original comment by rico.sen...@googlemail.com
on 8 Sep 2011 at 8:41
It works. Concerning the right category, there are three options:
- PTKVZ (from stts)
- VPRE (from SMOR currently, used)
- PTKL/Vz (from SMOR, inline with other types of particles)
I like the third option. What do you think?
> ein
ein<+VPRE>
ein<+ART><Indef><Masc><Nom><Sg>
ein<+ART><Indef><Neut><Nom><Sg>
ein<+ART><Indef><Neut><Akk><Sg>
> aus
aus<+PTKL><Vz>
aus<+VPRE>
aus<+PREP><Dat>
> durch
durch<+VPRE>
durch<+PREP><Akk>
> ver
no result for ver
> zu
zu<+PTKL><Adj>
zu<+PTKL><zu>
zu<+VPRE>
zu<+PREP><Dat>
> auf
auf<+PTKL><Vz>
auf<+VPRE>
auf<+PREP><Dat>
auf<+PREP><Akk>
> an
an<+PTKL><Vz>
an<+VPRE>
an<+CIRCP>
an<+PREP><Dat>
an<+PREP><Akk>
> weg
weg<+VPRE>
weg<+ADV>
> hin
hin<+VPRE>
hin<+ADV>
Original comment by wuerz...@gmail.com
on 8 Sep 2011 at 4:15
I think <+PTKL><Vz> would be a good choice. The existing code that generates
<+VPRE> analyses should be removed/disabled, so that there's no confusion.
Original comment by rico.sen...@googlemail.com
on 9 Sep 2011 at 9:47
Original issue reported on code.google.com by
eleonor...@gmx.net
on 30 Aug 2011 at 1:08