kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.5k stars 513 forks source link

RuntimeError util::OpenReadOrThrow #266

Closed fzy0728 closed 4 years ago

fzy0728 commented 4 years ago

question: RuntimeError: util/file.cc:76 in int util::OpenReadOrThrow(const char*) threw ErrnoException because `-1 == (ret = open(name, 00))'. No such file or directory while opening /yarn/nm/usercache/fuziyu/appcache/application_1570609431331_3867/container_1570609431331_3867_01_000001/pos_lm.bin

describe I use the kenlm build feature and run on spark yarn. I can confirm that binary model files exist in this directory

fzy0728 commented 4 years ago

Hope to receive your reply

kpu commented 4 years ago

A good bug report takes pains to be a minimal example. There's too many moving parts here (yarn, containers, etc) to diagnose. Come back with consecutive lines of bash showing head -n 1 $FILE and it failing to find $FILE.

zhangyuanscall commented 4 years ago

question: RuntimeError: util/file.cc:76 in int util::OpenReadOrThrow(const char*) threw ErrnoException because `-1 == (ret = open(name, 00))'. No such file or directory while opening /yarn/nm/usercache/fuziyu/appcache/application_1570609431331_3867/container_1570609431331_3867_01_000001/pos_lm.bin

describe I use the kenlm build feature and run on spark yarn. I can confirm that binary model files exist in this directory

Hi. you use kenlm api in spark? can it work fine in spark application?

kpu commented 4 years ago

It's a library. You tell it to open a file and it expects the file to exist. That library can happen to be run from spark if you want.
Whenever there are containers and processes with different root filesystems, it's easy for users to get confused about paths which is what I suspect happened here.

fzy0728 commented 4 years ago

question: RuntimeError: util/file.cc:76 in int util::OpenReadOrThrow(const char*) threw ErrnoException because `-1 == (ret = open(name, 00))'. No such file or directory while opening /yarn/nm/usercache/fuziyu/appcache/application_1570609431331_3867/container_1570609431331_3867_01_000001/pos_lm.bin describe I use the kenlm build feature and run on spark yarn. I can confirm that binary model files exist in this directory

Hi. you use kenlm api in spark? can it work fine in spark application? image

I used the Dataframe UDF method to use KenLM, but I found that KenLM could not be serialized with pickle, So I thought the problem might be here。And then I use MapPartitions is work。

fzy0728 commented 4 years ago

https://datascience.stackexchange.com/questions/55458/picklingerror-could-not-serialize-object-typeerror-cant-pickle-fasttext-pybi

fzy0728 commented 4 years ago

A good bug report takes pains to be a minimal example. There's too many moving parts here (yarn, containers, etc) to diagnose. Come back with consecutive lines of bash showing head -n 1 $FILE and it failing to find $FILE.

I am very glad to receive your reply. Thank you for your help