rsennrich / ParZu

The Zurich Dependency Parser for German
https://pub.cl.uzh.ch/demo/parzu/
GNU General Public License v2.0
81 stars 19 forks source link

Difference Local and WebVersion results #15

Closed inventorix closed 5 years ago

inventorix commented 6 years ago

Hello together,

I tried some sentences with my local Parzu-Version ( TüBa Statistics included ). The results are different to the Parzu Service on the WebPage. Do you know the reason for that?

Thanks in advance.

rsennrich commented 6 years ago

After checking, the difference is likely the result of one of the following:

  1. the web demo uses a newer morphology file (20150315) than was installed by default in ParZu (I updated the installation script to fix this).
  2. the web demo has a model for clevertagger (the POS tagger) that is also trained on TüBa.
  3. the web demo has n-best parsing activated: nbestmode = 4

    If you spot a difference in POS labelling, it is likely reason 2 or 3; otherwise, the difference is likely due to 1.

inventorix commented 6 years ago

Thank you for answering. Out of the new installation script I see that dependencies for clevertagger are installed. Is that enough to reinstall Parzu, so I have 1) and 2) fixed?

In which file can I change the nbestmode?

Greets

rsennrich commented 6 years ago

a reinstall will fix 1)

as to 2), the TüBa model for clevertagger is not available online, but you could train your own following the instructions in the clevertagger README.

as to 3), this is in config.ini

inventorix commented 6 years ago

as to 2)in the sample_training_file.txt under the clevertagger folder, spaces and tabs are not recognizable. I would like to adapt my Tueba chunks to this format. After which rules for spaces or tabs can I do that. Tueba Chunks look like:

So ADV B-NX viel PIAT I-NX Sinn NN I-NX für APPR . B-PX

sample_training_file.txt like this:

Und KON als KOUS ich PPER die ART deutsche ADJA Sprache NN vernahm VVFIN

rsennrich commented 6 years ago

I don't have the latest TüBa release to test this myself, but something like cut -f 1-2 -d " " should work. You don't need chunking information.