manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.83k stars 489 forks source link

What is the actual default of lemmatizer_base? #600

Open bcat-eu opened 3 years ago

bcat-eu commented 3 years ago

Describe the bug It's either wrong documentation or lemmatizer_base does not work unless set explicitly.

Background: I am setting up CI pipeline and would like to copy de.pak somewhere where Manticore discovers it without having to automate the config change.

To Reproduce

The documentation states that lemmatizer_base's default is /usr/local/share (https://manual.manticoresearch.com/Server_settings/Common#lemmatizer_base) - I have tried it but it doesn't seem to work, the pak file there is ignored.

I then looked through the sources and found another (conflicting) peace of documentation here https://github.com/manticoresoftware/manticoresearch/blob/48eabd5d45b0b3c261b3cc5fcd7c6e2abd55e167/manual/Server_settings/Common.md#lemmatizer_base which states it's /usr/share/manticore, so I tried that but it still seems to miss the file.

Here is what is being done:

  sudo dpkg -i path/to/manticore.deb
  sudo cp path/to/de.pak /usr/share/manticore
  sudo systemctl restart manticore
  [command that uses manticore]

And the output:

Selecting previously unselected package manticore.
(Reading database ... 233736 files and directories currently installed.)
Preparing to unpack .../Search/Distr/manticore.deb ...
Unpacking manticore (3.6.0-210504-96d61d8bf) ...
Replacing files in old package sphinxsearch (2.2.11-2ubuntu2) ...
Setting up manticore (3.6.0-210504-96d61d8bf) ...
Adding group `manticore' (GID 127) ...
Done.
Adding system user `manticore' (UID 118) ...
Adding new user `manticore' (UID 118) with group `manticore' ...
Not creating home directory `/home/manticore'.
Manticore Search (https://manticoresearch.com)

Getting Started with Manticore Search:
  https://manual.manticoresearch.com/Quick_start_guide

Learn Manticore with interactive courses:
  https://play.manticoresearch.com/

To start Manticore Search service:
  > systemctl start manticore

Configuration file:
  /etc/manticoresearch/manticore.conf

Created symlink /etc/systemd/system/manticore.service → /lib/systemd/system/manticore.service.
Created symlink /etc/systemd/system/searchd.service → /lib/systemd/system/manticore.service.
Created symlink /etc/systemd/system/multi-user.target.wants/manticore.service → /lib/systemd/system/manticore.service.
Processing triggers for systemd (245.4-4ubuntu3.7) ...
Processing triggers for man-db (2.9.1-1) ...
Starting search index rotation...
Instantiating the search engine client and service...

In Http.php line 125:

  "error adding index 'articles': failed to open \/de.pak: No such file or di  
  rectory"

Expected behavior

I'd expect that copying de.pak to /usr/share/manticore will make it "visible" to Manticore after I restart the service. Both deb and pack files are the same as used locally so they are working.

Also it might be good to have the full path in the error message, instead of "failed to open \/de.pak: No such file or directory" something like "failed to open /usr/share/manticore/de.pak: No such file or directory" - that would help.

Describe the environment:

tomatolog commented 3 years ago

could you provide your config to check what lemmatizer_base option do you use?

As from ticket description and daemon error message is not clear where is a wrong path.

bcat-eu commented 3 years ago

@tomatolog the ticket is about lemmatizer_base default, what the system assumes if it's not set explicitly.

As from ticket description and daemon error message is not clear where is a wrong path.

That's basically the problem, if I am not setting an explicit path in config, then the system uses some path that is not known to me and seems to not use anything mentioned in documentation.

For the cases where I do have a config with that path set, this ticket would not apply.

tomatolog commented 3 years ago

seems there is no default value for lemmatizer_base and this option should be set explicitly at config prior to use as it OS and installation dependent

It is also not clear why do you think indexer and daemon use some default value for this option?

bcat-eu commented 3 years ago

It is also not clear why do you think indexer and daemon use some default value for this option?

That's my assumption based on the documentation that I linked above, it gives you two versions of defaults and neither worked for me.