manticoresoftware / docker

Official docker for Manticore Search
66 stars 18 forks source link

Manticore has lost its index #91

Open simonszu opened 2 weeks ago

simonszu commented 2 weeks ago

Confirmation Checklist:

Your question:

I am using manticore as a search backend for mailpiler.org. Today i experienced an issue where manticore has lost its indexes. At first i opened an issue against piler (https://github.com/jsuto/piler/issues/179) and as a result the developer suggested that i should mount manticore's data as a persistent volume/folder, which is what i already did: https://github.com/jsuto/piler/issues/179#issuecomment-2328243984

Fortunately, piler has a command which allows it to recreate its indexes. However, this seems to be more a manticore than a piler issue, so i am asking for help: Do i mount the right folder? I followed the README of the manticore docker container and persisted the whole /var/lib/manticore folder. I am using this manticore.conf: https://github.com/jsuto/piler/blob/master/docker/manticore.conf

I could need some hints towards the right way to persist manticore's data to not lose the index again. What am i doing wrong?

djklim87 commented 2 weeks ago

Manticore indexes should be mounted to /var/lib/manticore

if you provide your docker-compose or docker run command, it will be easier to understand what you are doing wrong

Also docker logs can be useful

alsoGAMER commented 2 weeks ago

It seems I'm not the only one having this issue. @simonszu

Manticore indexes should be mounted to /var/lib/manticore

They indeed are!

if you provide your docker-compose or docker run command, it will be easier to understand what you are doing wrong

compose.txt

Also docker logs can be useful

logs.txt

@djklim87 there you have it, as it was also pointed out by @simonszu, doing a reindex solves the issue

simonszu commented 2 weeks ago

@djklim87 I am deploying my containers via Ansible. I'm sure that this particular ansible playbook task is verbose enough to anyone not familiar with ansible to see how the containers are started.

- name: Start container for manticore
  docker_container:
    name: manticore
    image: manticoresearch/manticore:6.3.2
    restart_policy: always
    volumes:
      - "{{ docker_datadir }}/piler/manticore/data:/var/lib/manticore"
      - "{{ docker_datadir }}/piler/manticore/config/manticore.conf:/etc/manticoresearch/manticore.conf"
    labels:
      com.centurylinklabs.watchtower.scope: "regular"
    networks:
      - name: backend
  become: yes

So, i am mounting a host directory to /var/lib/manticore. Inside this directory there are several files, apparently resembling the different indices.

These are the logs of manticore. Please notice that the visible restart of the container happened after i discovered the empty indices, so this is not the root cause of the indices being empty. I do not have the errors/warnings which @alsoGAMER mentioned, though.

[Mon Sep  2 14:33:32.929 2024] [1] using config file '/etc/manticoresearch/manticore.conf.sh' (2499 chars)...
starting daemon version '6.3.2 c296dc7c8@24062606' ...
listening on all interfaces for sphinx and http(s), port=9312
listening on all interfaces for mysql, port=9306
listening on all interfaces for RO mysql, port=9307
Manticore 6.3.2 c296dc7c8@24062606
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)

precaching table 'piler1'
precaching table 'tag1'
precaching table 'note1'
precaching table 'audit1'
prereading 4 tables
preread 4 tables in 0.000 sec
accepting connections
caught SIGTERM, shutting down
rt: table piler1: ramchunk saved in 0.019 sec
rt: table tag1: ramchunk saved in 0.004 sec
rt: table note1: ramchunk saved in 0.002 sec
shutdown daemon version '6.3.2 c296dc7c8@24062606' ...
shutdown complete
precached 4 tables in 0.003 sec
[Wed Sep  4 07:27:52.707 2024] [1] using config file '/etc/manticoresearch/manticore.conf.sh' (2499 chars)...
starting daemon version '6.3.2 c296dc7c8@24062606' ...
listening on all interfaces for sphinx and http(s), port=9312
listening on all interfaces for mysql, port=9306
listening on all interfaces for RO mysql, port=9307
Manticore 6.3.2 c296dc7c8@24062606
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)

precaching table 'piler1'
precaching table 'tag1'
precaching table 'note1'
precaching table 'audit1'
prereading 4 tables
preread 4 tables in 0.000 sec
accepting connections
alsoGAMER commented 2 weeks ago

These are the logs of manticore. Please notice that the visible restart of the container happened after i discovered the empty indices, so this is not the root cause of the indices being empty. I do not have the errors/warnings which @alsoGAMER mentioned, though.

Restarting the container is fine, the issue crops up when you remove it (docker compose down)

simonszu commented 2 weeks ago

I did neither restart nor recreate the container, the index was lost during regular operations, that is: piler accessing the manticore instance.

tomatolog commented 2 weeks ago

you need to provide daemon logs since you have your indexes and till these vanished

djklim87 commented 2 weeks ago

You map your volume to the docker volume, instead of mapping to the local filesystem. So docker-compose down probably can remove your image

  manticore:
    image: manticoresearch/manticore:6.3.6
.....
    volumes:
      - ./data/manticore.conf:/etc/manticoresearch/manticore.conf
      - piler_manticore:/var/lib/manticore

Check details here https://stackoverflow.com/questions/65799945/why-docker-compose-down-deletes-my-volume-how-to-avoid-this-action-done-by-dow

simonszu commented 1 week ago

@tomatolog where do i find these logs? I have already provided my docker logs from the container. Are there any other logs which are not sent to STDOUT by manticore itself?

djklim87 commented 1 week ago

@tomatolog where do i find these logs? I have already provided my docker logs from the container. Are there any other logs which are not sent to STDOUT by manticore itself?

cat /var/lib/manticore/manticore.log

simonszu commented 1 week ago

OK, this is the log, nothing new besides the STDOUT log:

[Mon Sep  2 14:33:23.719 2024] [1] starting daemon version '6.3.2 c296dc7c8@24062606' ...
[Mon Sep  2 14:33:23.720 2024] [1] listening on all interfaces for sphinx and http(s), port=9312
[Mon Sep  2 14:33:23.720 2024] [1] listening on all interfaces for mysql, port=9306
[Mon Sep  2 14:33:23.720 2024] [1] listening on all interfaces for RO mysql, port=9307
[Mon Sep  2 14:33:23.742 2024] [64] prereading 4 tables
[Mon Sep  2 14:33:23.742 2024] [64] preread 4 tables in 0.000 sec
[Mon Sep  2 14:33:23.749 2024] [1] accepting connections
[Mon Sep  2 14:33:31.937 2024] [1] caught SIGTERM, shutting down
[Mon Sep  2 14:33:31.941 2024] [1] shutdown daemon version '6.3.2 c296dc7c8@24062606' ...
[Mon Sep  2 14:33:31.941 2024] [1] shutdown complete
[Mon Sep  2 14:33:32.930 2024] [1] Using local time zone '/etc/localtime'
[Mon Sep  2 14:33:32.931 2024] [1] starting daemon version '6.3.2 c296dc7c8@24062606' ...
[Mon Sep  2 14:33:32.931 2024] [1] listening on all interfaces for sphinx and http(s), port=9312
[Mon Sep  2 14:33:32.931 2024] [1] listening on all interfaces for mysql, port=9306
[Mon Sep  2 14:33:32.931 2024] [1] listening on all interfaces for RO mysql, port=9307
[Mon Sep  2 14:33:32.956 2024] [63] prereading 4 tables
[Mon Sep  2 14:33:32.956 2024] [63] preread 4 tables in 0.000 sec
[Mon Sep  2 14:33:32.961 2024] [1] accepting connections
[Wed Sep  4 07:27:51.278 2024] [1] caught SIGTERM, shutting down
[Wed Sep  4 07:27:51.382 2024] [63] rt: table piler1: ramchunk saved in 0.019 sec
[Wed Sep  4 07:27:51.389 2024] [63] rt: table tag1: ramchunk saved in 0.004 sec
[Wed Sep  4 07:27:51.394 2024] [64] rt: table note1: ramchunk saved in 0.002 sec
[Wed Sep  4 07:27:51.398 2024] [1] shutdown daemon version '6.3.2 c296dc7c8@24062606' ...
[Wed Sep  4 07:27:51.398 2024] [1] shutdown complete
[Wed Sep  4 07:27:52.718 2024] [1] Using local time zone '/etc/localtime'
[Wed Sep  4 07:27:52.719 2024] [1] starting daemon version '6.3.2 c296dc7c8@24062606' ...
[Wed Sep  4 07:27:52.719 2024] [1] listening on all interfaces for sphinx and http(s), port=9312
[Wed Sep  4 07:27:52.720 2024] [1] listening on all interfaces for mysql, port=9306
[Wed Sep  4 07:27:52.720 2024] [1] listening on all interfaces for RO mysql, port=9307
[Wed Sep  4 07:27:52.784 2024] [61] prereading 4 tables
[Wed Sep  4 07:27:52.784 2024] [61] preread 4 tables in 0.000 sec
[Wed Sep  4 07:27:52.821 2024] [1] accepting connections

For reference, this is the manticore.conf which was shipped with piler and which i am using: https://github.com/jsuto/piler/blob/master/docker/manticore.conf

alsoGAMER commented 1 week ago

It's the same for me too

tomatolog commented 1 week ago

from log you provided I see that daemon always loads 4 indexes

prereading 4 tables

If you have another tables provide list of tables you see at the daemon

simonszu commented 1 week ago

I think this is correct. This is the content of the mounted manticore data directory:

audit1.lock  audit1.meta  audit1.ram  binlog.001  binlog.lock  binlog.meta  manticore.log  manticore.pid  note1.lock  note1.meta  note1.ram  piler1.lock  piler1.meta  piler1.ram  query.log  tag1.lock  tag1.meta  tag1.ram

and the 4 indexes are defined in the piler-supplied manticore config:

For each index there are a .lock file, a .meta file and a .ram file. The search index in my situation has to contain the data for 78.000 emails, and the biggest index file is piler1. All files inside the mounted directory have their last modified timestamp at Sept. 4 at 09:27 which resembles the restart of the container which is visible in the provided log file. Please note that my timezone is currently UTC+2 and the container runs as UTC. I have started recreating the indexes shortly after i have restarted the container after realizing that restarting it did not fix the issue.

sanikolaev commented 1 week ago

I kind of reproduced the data loss using your docker-compose yaml file and your config, but it required me removing your named volumes:

snikolaev@dev2:~/issue_91$ docker-compose up
WARNING: Some networks were defined but are not used by any service: traefik
Creating network "issue_91_default" with the default driver
Creating volume "issue_91_db_data" with default driver
Creating volume "issue_91_piler_etc" with default driver
Creating volume "issue_91_piler_manticore" with default driver
Creating volume "issue_91_piler_store" with default driver
Pulling manticore (manticoresearch/manticore:6.3.6)...
6.3.6: Pulling from manticoresearch/manticore
Digest: sha256:767c48291e48fdff47f08772e1cc0340f389fb008663a60142825b314710121f
Status: Downloaded newer image for manticoresearch/manticore:6.3.6
Creating piler_manticore ... done
Attaching to piler_manticore
piler_manticore | [Wed Sep 11 04:31:07.248 2024] [1] using config file '/etc/manticoresearch/manticore.conf.sh' (2499 chars)...
piler_manticore | starting daemon version '6.3.6 593045790@24080214' ...
piler_manticore | listening on all interfaces for sphinx and http(s), port=9312
piler_manticore | listening on all interfaces for mysql, port=9306
piler_manticore | listening on all interfaces for RO mysql, port=9307
piler_manticore | Manticore 6.3.6 593045790@24080214
piler_manticore | Copyright (c) 2001-2016, Andrew Aksyonoff
piler_manticore | Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
piler_manticore | Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
piler_manticore |
piler_manticore | precaching table 'piler1'
piler_manticore | precaching table 'tag1'
piler_manticore | precaching table 'note1'
piler_manticore | precaching table 'audit1'
piler_manticore | prereading 4 tables
piler_manticore | preread 4 tables in 0.000 sec
piler_manticore | accepting connections
snikolaev@dev2:~/issue_91$ docker-compose exec manticore mysql -P9306 -h0 -e "insert into audit1(id) values(1)"
WARNING: Some networks were defined but are not used by any service: traefik

snikolaev@dev2:~/issue_91$ docker-compose exec manticore mysql -P9306 -h0 -e "select * from audit1"
WARNING: Some networks were defined but are not used by any service: traefik
+------+-------+--------+-------------+------+---------+--------+
| id   | email | ipaddr | description | ts   | meta_id | action |
+------+-------+--------+-------------+------+---------+--------+
|    1 |       |        |             |    0 |       0 |      0 |
+------+-------+--------+-------------+------+---------+--------+
snikolaev@dev2:~/issue_91$ docker-compose down -v --rmi all
WARNING: Some networks were defined but are not used by any service: traefik
Removing piler_manticore ... done
Removing network issue_91_default
Removing volume issue_91_db_data
Removing volume issue_91_piler_etc
Removing volume issue_91_piler_manticore
Removing volume issue_91_piler_store
Removing image manticoresearch/manticore:6.3.6

snikolaev@dev2:~/issue_91$ docker-compose up

snikolaev@dev2:~/issue_91$ docker-compose exec manticore mysql -P9306 -h0 -e "select * from audit1"
WARNING: Some networks were defined but are not used by any service: traefik
snikolaev@dev2:~/issue_91$

The table is there, but the data is lost.

All files inside the mounted directory have their last modified timestamp at Sept. 4 at 09:27 which resembles the restart of the container which is visible in the provided log file

I can't see it in the provided logs.txt, but if so, then the likely reason of emptying the table is that they were recreated from scratch by Manticore on start which can only happen if the table files were completely removed physically.

Please notice that the visible restart of the container happened after i discovered the empty indices, so this is not the root cause of the indices being empty

It's very unlikely that the data was removed by its own without a DELETE/TRUNCATE command or removing the table files. There's no other known issue about it and IIRC it has never been an issue.

I suggest 3 things: