manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.09k stars 510 forks source link

Create table process stopword path #1404

Open pavelnemirovsky opened 1 year ago

pavelnemirovsky commented 1 year ago

Guys,

When I create a table, I do that the following way....

CREATE TABLE example (
    id bigint,
    article_body_hash text stored,
    content text indexed,
    publish_date timestamp,
    internal_id string attribute,
    tags_id json,
    tags_name json,
    entities_id json
) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='en' rt_mem_limit='2147483648' 

After it has been created, it appears that it is pointing to stopwords dictionary in a non-existent path (stopwords='/var/lib/manticore/example/en' )

CREATE TABLE example (
id bigint,
article_body_hash text stored,
content text indexed,
publish_date timestamp,
internal_id string attribute,
tags_id json,
tags_name json,
entities_id json
) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/var/lib/manticore/example/en' rt_mem_limit='2147483648'

Please advise what the way to correct it. Is there a way to specify some base folder configuration where stopword dictionaries are like /usr/share/manticore/stopwords ?

mohdmsl commented 1 year ago

same for me I have ran below create table query

CREATE TABLE fgi_dev3 (
id bigint,
article_body_hash text stored,
content text indexed,
publish_date timestamp,
internal_id string attribute,
tags_id json,
tags_name json,
entities_id json
) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/usr/share/manticore/stopwords/en' rt_mem_limit='2147483648'

after running show create table fgi_dev3 response surprsingly is

CREATE TABLE fgi_dev3 (
id bigint,
publish_date timestamp,
internal_id string attribute,
tags_id json,
tags_name json,
entities_id json,
article_body_hash text stored,
content text indexed
) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/var/lib/manticore/fgi_dev3/en' rt_mem_limit='2147483648'

expected stopwords = /usr/share/manticore/stopwords/en actual stopwords = /var/lib/manticore/fgi_dev3/en

Please rectify this issue

sanikolaev commented 1 year ago

it appears that it is pointing to stopwords dictionary in a non-existent path

I can't reproduce it: for me the path exists after running the create table command:

snikolaev@dev2:~$ mysql -P9306 -h0 -v
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 714
Server version: 6.2.13 08eef737c@230823 dev (columnar 2.2.1 c6dbbcb@230820) (secondary 2.2.1 c6dbbcb@230820) git branch master...origin/master

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Reading history-file /home/snikolaev/.mysql_history
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop table if exists example;
--------------
drop table if exists example
--------------

Query OK, 0 rows affected (0.03 sec)

mysql>
mysql> CREATE TABLE example (
    -> id bigint,
    -> article_body_hash text stored,
    -> content text indexed,
    -> publish_date timestamp,
    -> internal_id string attribute,
    -> tags_id json,
    -> tags_name json,
    -> entities_id json
    -> ) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/var/lib/manticore/example/en' rt_mem_limit='2147483648';
--------------
CREATE TABLE example (
id bigint,
article_body_hash text stored,
content text indexed,
publish_date timestamp,
internal_id string attribute,
tags_id json,
tags_name json,
entities_id json
) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/var/lib/manticore/example/en' rt_mem_limit='2147483648'
--------------

Query OK, 0 rows affected, 3 warnings (0.00 sec)

snikolaev@dev2:~$ sudo ls -la /var/lib/manticore/example/en
-rw------- 1 manticore manticore 954 Aug 29 11:16 /var/lib/manticore/example/en

expected stopwords = /usr/share/manticore/stopwords/en actual stopwords = /var/lib/manticore/fgi_dev3/en

When you specify a file in create table ... stopwords/exceptions/wordforms=smth the file gets copied to the table dir. If you specify one of the built-in stopwords (e.g. en) it's copied over it from /usr/share/manticore/stopwords. The result is that the stopwords file is in the table dir and it's expected.