kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.5k stars 513 forks source link

query outputs final stats to STDOUT #275

Closed emjotde closed 4 years ago

emjotde commented 4 years ago

Hi, This is appended to STDOUT when running query and while doing line-by-line processing.

Perplexity including OOVs:      7179.727316592978
Perplexity excluding OOVs:      7179.727316592978
OOVs:   0
Tokens: 2

Cost me a lot of time to debug or even realize that my lines do not align when using with GNU parallel. Maybe make these optional, or suppressible?

Alternatively having to do cat file.txt | query bla.bin | head -n -4 is not great when not actually remembering that one has to do it :)

kpu commented 4 years ago

Pull request?

emjotde commented 4 years ago

Taking a screenshot of this :P

kpu commented 4 years ago

Checklist:

kpu commented 4 years ago

I've added more controls. The default is still to print everything but the behavior of -v sentence now just prints sentence lines.

emjotde commented 4 years ago

-v words is technically a line-by-line processor, too, isn't it?

kpu commented 4 years ago

Yes.

emjotde commented 4 years ago

Which is where I actually ran into this. Forgot to mention explicitly.

rnajim commented 2 years ago

Hi, can you please tell me how to calculate the perplexity including OOVs? or is there a script for that?