Closed ghost closed 3 years ago
I'm not sure what you want here. The tokenizer uses mmap system call so it needs file descriptors; you can avoid "too many open files" error by increasing the system's max file descriptors limit - https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
But why do you need to create so many Tokenizer instances.
Actually it's not about specification but system resource limitation that varies depending on indivisual system (like CPU or memory or disk space you can use).
About documentation - It's documented that Tokenizer utilizes memory mapped file feature; I think it's enough for users who has basic OS knowledge. We could document about configuring the ulimit value if it's a common use case to create many tokenizer objects at once, but I don't think we need to do so.
Please check the documentation before asking questions. (We are not a support desk. ;) https://mocobeta.github.io/janome/#memory-mapped-file-v0-3-3 https://mocobeta.github.io/janome/en/#memory-mapped-file-support-v0-3-3
I have Janome of version 0.4.1. I created many Janome tokenizers and I received error the following.
The detail code is the following.