query outputs final stats to STDOUT

kpu / kenlm

KenLM: Faster and Smaller Language Model Queries

http://kheafield.com/code/kenlm/

Other

2.5k stars 513 forks source link

Closed emjotde closed 4 years ago

emjotde commented 4 years ago

Hi, This is appended to STDOUT when running query and while doing line-by-line processing.

Perplexity including OOVs:      7179.727316592978
Perplexity excluding OOVs:      7179.727316592978
OOVs:   0
Tokens: 2

Cost me a lot of time to debug or even realize that my lines do not align when using with GNU parallel. Maybe make these optional, or suppressible?

Alternatively having to do cat file.txt | query bla.bin | head -n -4 is not great when not actually remembering that one has to do it :)

kpu commented 4 years ago

Pull request?

emjotde commented 4 years ago

Taking a screenshot of this :P

kpu commented 4 years ago

Checklist:

[ ] Ran the tests
[ ] Fixed unrelated warnings that already existed in the code but Frank doesn't want to fix
[ ] Ping maintainer a week later to remind them
[ ] Maintainer updated master while you waited; make sure to do the merge
[ ] Update changelog
[ ] Are you allowed to use this code as a submodule? it hasn't been reviewed by Microsoft employees.

kpu commented 4 years ago

I've added more controls. The default is still to print everything but the behavior of -v sentence now just prints sentence lines.

emjotde commented 4 years ago

-v words is technically a line-by-line processor, too, isn't it?

kpu commented 4 years ago

Yes.

emjotde commented 4 years ago

Which is where I actually ran into this. Forgot to mention explicitly.

rnajim commented 2 years ago

Hi, can you please tell me how to calculate the perplexity including OOVs? or is there a script for that?