redpony / cdec

Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) context-free formalisms
http://cdec-decoder.org/
Apache License 2.0
183 stars 77 forks source link

Lattice output broken in compound-split #4

Closed vchahun closed 12 years ago

vchahun commented 12 years ago

Example:

echo "fadenschneidevorrichtung für eine nähmaschine" | ./compound-split.pl --output plf
Executing: ../decoder/cdec -c cdec-de.ini --csplit_preserve_full_word --csplit_output_plf --beam_prune 2.1
Error: --csplit_preserve_full_word should only be used with csplit AND --*_prune!

1-best output works well.

redpony commented 12 years ago

Thanks Victor- I'll fix this.

On Sun, May 20, 2012 at 10:07 PM, Victor Chahuneau reply@reply.github.com wrote:

Example:

echo "fadenschneidevorrichtung für eine nähmaschine" | ./compound-split.pl --output plf
Executing: ../decoder/cdec -c cdec-de.ini --csplit_preserve_full_word --csplit_output_plf --beam_prune 2.1
Error: --csplit_preserve_full_word should only be used with csplit AND --*_prune!

1-best output works well.


Reply to this email directly or view it on GitHub: https://github.com/redpony/cdec/issues/4

redpony commented 12 years ago

Hi Victor, Just getting around to this, but I'm not able to reproduce the error:

[cdyer@cab compound-split]$ echo "fadenschneidevorrichtung für eine nähmaschine" | ./compound-split.pl --output plf (Run with --help for options) LANGUAGE: de OUTPUT: plf Executing: /home/cdyer/cdec/compound-split/../decoder/cdec -c cdec-de.ini --csplit_preserve_full_word --csplit_output_plf --beam_prune 2.1 2> /dev/null ((('faden',-7.15529,1),('fadenschneidevorrichtung',-40.6998,3),),(('schneide',-9.41413,1),),(('vorrichtung',-12.9222,1),),(('für',0,1),),(('eine',0,1),),(('näh',-7.77325,1),('nähmaschine',-13.8472,2),),(('maschine',-7.86775,1),),)

Do you have any changes to your cdec that could be interacting badly?

On Sun, May 20, 2012 at 7:01 PM, Chris Dyer cdyer@cs.cmu.edu wrote:

Thanks Victor- I'll fix this.

On Sun, May 20, 2012 at 10:07 PM, Victor Chahuneau reply@reply.github.com wrote:

Example:

echo "fadenschneidevorrichtung für eine nähmaschine" | ./compound-split.pl --output plf
Executing: ../decoder/cdec -c cdec-de.ini --csplit_preserve_full_word --csplit_output_plf --beam_prune 2.1
Error: --csplit_preserve_full_word should only be used with csplit AND --*_prune!

1-best output works well.


Reply to this email directly or view it on GitHub: https://github.com/redpony/cdec/issues/4

vchahun commented 12 years ago

I have updated my Boost and now it works perfectly... one more Boost mistery!

redpony commented 12 years ago

Boost is misery. :(

Also, if you want slightly better segmentation quality for the patent domain, it takes very few annotations to get a good model since there aren't many features. If you give me some representative data, I can do the annotation and show you how to retrain the model.

On Sun, May 27, 2012 at 5:05 PM, Victor Chahuneau reply@reply.github.com wrote:

I have updated my Boost and now it works perfectly... one more Boost mistery!


Reply to this email directly or view it on GitHub: https://github.com/redpony/cdec/issues/4#issuecomment-5956425