manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.99k stars 500 forks source link

agent has 32-bit docids; no longer supported #575

Closed daikoz closed 1 year ago

daikoz commented 3 years ago

Hi,

I have randomly this error:

index MYINDEX: agent SERVERX:9312: agent has 32-bit docids; no longer supported

My configuration index:

index MYINDEX  
{  
     type             = distributed  
     agent_persistent = SERVER1:9312|SERVER2:9312|SERVER3:9312|SERVER4:9312:MYINDEX  
     ha_strategy      = nodeads  
}  

How to debug to provide more information about this randomly error ?

I try:

searchd --console --logdebugv

Follow the debug log when I have this error "agent has 32-bit docids; no longer supported":

[Mon Jun  7 18:39:02.535 2021] [27706] DEBUG: -~-~-~-~-~-~-~-~ MT sched created -~-~-~-~-~-~-~-~  
[Mon Jun  7 18:39:02.535 2021] [27706] DEBUG: 0x55ab3e033410 accepted sphinx and http(s), sock=107, tick=2825  
[Mon Jun  7 18:39:02.535 2021] [27706] DEBUG: 0x55ab3e033410 accepted 1 connections all, tick=2825  
[Mon Jun  7 18:39:02.535 2021] [27709] DEBUG: state CONNECTING > HEALTHY, sock 102, order 0, 0x7fa1fc003fa0  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: =========== AsyncRecvNBChunk 412 when read 4096 bytes  
[Mon Jun  7 18:39:02.535 2021] [27709] DEBUG: - 0 Change task (task 0x7fa08402adc0), fd=102 (102) 1623091143535389Us -> 1623091145535709Us  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: AsyncRecvNBChunk 412 bytes (0 requested)  
[Mon Jun  7 18:39:02.535 2021] [27709] DEBUG: - 0 DisableWrite enqueueing (task 0x7fa08402adc0) (1->2), innet=1  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG:       Probing revealed 412 bytes: SphinxAPI, usual byte order  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: conn 127.0.0.1:49382(461): got handshake, major v.1  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: conn 127.0.0.1:49382(461): loop start with timeout 5  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: read command 4, version 0, reply size 4  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: conn 127.0.0.1:49382(461): got command 4, handling  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: conn 127.0.0.1:49382(461): pconn is now on  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: AsyncSend 4 bytes  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: conn 127.0.0.1:49382(461): loop start with timeout 1  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: read command 0, version 289, reply size 388  
[Mon Jun  7 18:39:02.535 2021] [27697] DEBUG: conn 127.0.0.1:49382(461): got command 0, handling  
[Mon Jun  7 18:39:02.536 2021] [27697] DEBUG: Tick coro search  
[Mon Jun  7 18:39:02.538 2021] [27695] DEBUG: last message repeated 4 times  
[Mon Jun  7 18:39:02.538 2021] [27695] DEBUG: AsyncSend 237 bytes  
[Mon Jun  7 18:39:02.538 2021] [27695] DEBUG: DiscardProcessed(412) iPos=412->0, iLen=412->0  
[Mon Jun  7 18:39:02.538 2021] [27695] DEBUG: conn 127.0.0.1:49382(461): loop start with timeout 1  
[Mon Jun  7 18:39:02.538 2021] [27695] DEBUG: =========== AsyncRecvNBChunk -1 when read 4096 bytes  
[Mon Jun  7 18:39:02.538 2021] [27695] DEBUG: AsyncRecvNBChunk -1 bytes (8 requested)  
[Mon Jun  7 18:39:02.538 2021] [27695] DEBUG: Still need to receive 8 bytes...  
[Mon Jun  7 18:39:02.538 2021] [27695] DEBUG: CoYieldWith (m_iEvent=17), timeout 999989  
[Mon Jun  7 18:39:02.538 2021] [27709] DEBUG: - 0 Delete task (task 0x7fa08402adc0), fd=102 (102) 1623091145535709Us  
[Mon Jun  7 18:39:02.538 2021] [27706] DEBUG: got events=1, tick=2826, interrupted=0  
[Mon Jun  7 18:39:02.538 2021] [27706] DEBUG: 0x7fa084024f10 epoll 94 setup, ev=0x3221225473, op=1, sock=107  
[Mon Jun  7 18:39:02.538 2021] [27696] DEBUG: agent 0, state HEALTHY, order 0, sock -1  
[Mon Jun  7 18:39:02.538 2021] [27696] DEBUG: AgentConn 0x7fa1fc003fa0 destroyed  
[Mon Jun  7 18:39:02.538 2021] [27696] DEBUG: AsyncSend 361 bytes  
[Mon Jun  7 18:39:02.538 2021] [27696] DEBUG: DiscardProcessed(315) iPos=315->0, iLen=315->0  
[Mon Jun  7 18:39:02.538 2021] [27696] DEBUG: Receiving command... 0 bytes in buf  
[Mon Jun  7 18:39:02.538 2021] [27696] DEBUG: =========== AsyncRecvNBChunk -1 when read 4096 bytes  
[Mon Jun  7 18:39:02.538 2021] [27696] DEBUG: AsyncRecvNBChunk -1 bytes (4 requested)  
[Mon Jun  7 18:39:02.538 2021] [27696] DEBUG: Still need to receive 4 bytes...  
[Mon Jun  7 18:39:02.538 2021] [27696] DEBUG: CoYieldWith (m_iEvent=17), timeout 899999989  
[Mon Jun  7 18:39:02.538 2021] [27706] DEBUG: got events=1, tick=2827, interrupted=0  
[Mon Jun  7 18:39:02.538 2021] [27706] DEBUG: 0x7fa08402a320 epoll 94 setup, ev=0x3221225473, op=3, sock=113  
[Mon Jun  7 18:39:02.539 2021] [27706] DEBUG: got events=1, tick=2828, interrupted=0  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: EngageWaiterAndYield awake (m_iSock=113, events=1)  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: =========== AsyncRecvNBChunk 270 when read 4096 bytes  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: SyncSockRead: AsyncRecvNBChunk returned 270  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: DiscardProcessed(4) iPos=4->0, iLen=270->266  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: AsyncReadMySQLPacketHeader returned 266 len...  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: LoopClientMySQL command 3, '  
SELECT keywords, category  
FROM SearchKeywordsLB  
WHERE country = 826 AND inserteddatets > 1607366342 AND MATCH('"caravane c"/0.3')  AND category = 4   
ORDER BY inserteddatets DESC  
LIMIT 0, 12  
OPTION max_matches = 12, comment = 'XXX', retry_delay = 0;'  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: S ==========> ScheduleDistrJobs() for 1 remotes  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: state HEALTHY > HEALTHY, sock -1, order 0, 0x7fa1fc004120  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: client=SERVER4:9312, HA selected 3 node by weighted random, with best EaR (0), last answered in 0.133 milliseconds, among 4 candidates  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: state HEALTHY > RETRY, sock 104, order 0, 0x7fa1fc004120  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: S ScheduleDistrJobs() done. Total 1  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: AgentConn 0x7fa1fc004120 destroyed  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: AsyncSend 112 bytes  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: DiscardProcessed(266) iPos=266->0, iLen=266->0  
[Mon Jun  7 18:39:02.539 2021] [27699] DEBUG: Receiving command... 0 bytes in buf  
[Mon Jun  7 18:39:02.540 2021] [27699] DEBUG: =========== AsyncRecvNBChunk -1 when read 4096 bytes  
[Mon Jun  7 18:39:02.540 2021] [27699] DEBUG: AsyncRecvNBChunk -1 bytes (4 requested)  
[Mon Jun  7 18:39:02.540 2021] [27699] DEBUG: Still need to receive 4 bytes...  
[Mon Jun  7 18:39:02.540 2021] [27699] DEBUG: CoYieldWith (m_iEvent=17), timeout 899999989  
[Mon Jun  7 18:39:02.540 2021] [27706] DEBUG: got events=1, tick=2829, interrupted=0  
[Mon Jun  7 18:39:02.540 2021] [27706] DEBUG: 0x7fa08402a320 epoll 94 setup, ev=0x3221225473, op=3, sock=113  
[Mon Jun  7 18:39:02.571 2021] [27706] DEBUG: got events=0, tick=2830, interrupted=0  
[Mon Jun  7 18:39:02.571 2021] [27706] DEBUG: 0x7fa084001440 bailing on timeout no signal, sock=124  
[Mon Jun  7 18:39:02.571 2021] [27706] DEBUG: RemoveEvent()  
[Mon Jun  7 18:39:02.571 2021] [27706] DEBUG: 0x7fa0840261b0 polling remove, ev=17, sock=124  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: EngageWaiterAndYield awake (m_iSock=124, events=16)  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: return TIMEOUT...  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: AsyncNetInputBuffer_c::AppendData: error 110 (Connection timed out) return -1  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: conn 127.0.0.1:49178(323): persist   timeout condition  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: conn 127.0.0.1:49178(323): timeout, not reached, continue  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: DiscardProcessed(0) iPos=0->0, iLen=0->0  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: conn 127.0.0.1:49178(323): loop start with timeout 1  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: =========== AsyncRecvNBChunk -1 when read 4096 bytes  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: AsyncRecvNBChunk -1 bytes (8 requested)  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: Still need to receive 8 bytes...  
[Mon Jun  7 18:39:02.571 2021] [27698] DEBUG: CoYieldWith (m_iEvent=17), timeout 999987  
[Mon Jun  7 18:39:02.571 2021] [27706] DEBUG: got events=1, tick=2831, interrupted=0  
[Mon Jun  7 18:39:02.571 2021] [27706] DEBUG: 0x7fa084033cb0 epoll 94 setup, ev=0x3221225473, op=1, sock=124  
[Mon Jun  7 18:39:02.670 2021] [27706] DEBUG: got events=0, tick=2832, interrupted=0  
[Mon Jun  7 18:39:02.671 2021] [27706] DEBUG: 0x7fa084046430 bailing on timeout no signal, sock=120  
[Mon Jun  7 18:39:02.671 2021] [27706] DEBUG: RemoveEvent()  
[Mon Jun  7 18:39:02.671 2021] [27706] DEBUG: 0x7fa084046650 polling remove, ev=17, sock=120  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: EngageWaiterAndYield awake (m_iSock=120, events=16)  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: return TIMEOUT...  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: AsyncNetInputBuffer_c::AppendData: error 110 (Connection timed out) return -1  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: conn 127.0.0.1:48886(131): persist   timeout condition  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: conn 127.0.0.1:48886(131): timeout, not reached, continue  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: DiscardProcessed(0) iPos=0->0, iLen=0->0  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: conn 127.0.0.1:48886(131): loop start with timeout 1  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: =========== AsyncRecvNBChunk -1 when read 4096 bytes  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: AsyncRecvNBChunk -1 bytes (8 requested)  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: Still need to receive 8 bytes...  
[Mon Jun  7 18:39:02.671 2021] [27702] DEBUG: CoYieldWith (m_iEvent=17), timeout 999989  
[Mon Jun  7 18:39:02.671 2021] [27706] DEBUG: got events=1, tick=2833, interrupted=0  
[Mon Jun  7 18:39:02.671 2021] [27706] DEBUG: 0x7fa0840261b0 epoll 94 setup, ev=0x3221225473, op=1, sock=120  
[Mon Jun  7 18:39:02.880 2021] [27706] DEBUG: got events=0, tick=2834, interrupted=0  
[Mon Jun  7 18:39:02.880 2021] [27706] DEBUG: 0x7fa084033f90 bailing on timeout no signal, sock=122  
[Mon Jun  7 18:39:02.880 2021] [27706] DEBUG: RemoveEvent()  
[Mon Jun  7 18:39:02.880 2021] [27706] DEBUG: 0x7fa084026d10 polling remove, ev=17, sock=122  
[Mon Jun  7 18:39:02.880 2021] [27700] DEBUG: EngageWaiterAndYield awake (m_iSock=122, events=16)  
[Mon Jun  7 18:39:02.880 2021] [27700] DEBUG: return TIMEOUT...  
[Mon Jun  7 18:39:02.880 2021] [27700] DEBUG: AsyncNetInputBuffer_c::AppendData: error 110 (Connection timed out) return -1  
[Mon Jun  7 18:39:02.880 2021] [27700] DEBUG: conn 127.0.0.1:49270(383): persist   timeout condition  
[Mon Jun  7 18:39:02.880 2021] [27700] DEBUG: conn 127.0.0.1:49270(383): timeout, not reached, continue  
[Mon Jun  7 18:39:02.880 2021] [27700] DEBUG: DiscardProcessed(0) iPos=0->0, iLen=0->0  
[Mon Jun  7 18:39:02.880 2021] [27700] DEBUG: conn 127.0.0.1:49270(383): loop start with timeout 1  
[Mon Jun  7 18:39:02.881 2021] [27700] DEBUG: =========== AsyncRecvNBChunk -1 when read 4096 bytes  
[Mon Jun  7 18:39:02.881 2021] [27700] DEBUG: AsyncRecvNBChunk -1 bytes (8 requested)  
[Mon Jun  7 18:39:02.881 2021] [27700] DEBUG: Still need to receive 8 bytes...  
[Mon Jun  7 18:39:02.881 2021] [27700] DEBUG: CoYieldWith (m_iEvent=17), timeout 999989  
[Mon Jun  7 18:39:02.881 2021] [27706] DEBUG: got events=1, tick=2835, interrupted=0  
[Mon Jun  7 18:39:02.881 2021] [27706] DEBUG: 0x7fa084046650 epoll 94 setup, ev=0x3221225473, op=1, sock=122  
[Mon Jun  7 18:39:02.900 2021] [27701] DEBUG: S ==========> ScheduleDistrJob()  
[Mon Jun  7 18:39:02.901 2021] [27701] DEBUG: state HEALTHY > HEALTHY, sock -1, order -1, 0x7fa1e80130b0  
[Mon Jun  7 18:39:02.901 2021] [27701] DEBUG: state HEALTHY > CONNECTING, sock 101, order -1, 0x7fa1e80130b0  
[Mon Jun  7 18:39:02.901 2021] [27701] DEBUG: - CreateNewTask for (0x7fa1e80130b0)->0x7fa0840019a0, ref=3  
[Mon Jun  7 18:39:02.901 2021] [27701] DEBUG: - -1 EnqueueNewTask 0x7fa0840019a0 (0x7fa1e80130b0) 1623091143901154 Us, IO(0->1), sock 101  
[Mon Jun  7 18:39:02.901 2021] [27701] DEBUG: S ScheduleDistrJob() done. Result is 1  
[Mon Jun  7 18:39:02.901 2021] [27706] DEBUG: got events=1, tick=2836, interrupted=0  
[Mon Jun  7 18:39:02.901 2021] [27706] DEBUG: -~-~-~-~-~-~-~-~ MT sched created -~-~-~-~-~-~-~-~  
[Mon Jun  7 18:39:02.901 2021] [27706] DEBUG: 0x55ab3e033410 accepted sphinx and http(s), sock=125, tick=2836  
[Mon Jun  7 18:39:02.901 2021] [27706] DEBUG: 0x55ab3e033410 accepted 1 connections all, tick=2836  
[Mon Jun  7 18:39:02.901 2021] [27709] DEBUG: state CONNECTING > HEALTHY, sock 101, order -1, 0x7fa1e80130b0  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: =========== AsyncRecvNBChunk 28 when read 4096 bytes  
[Mon Jun  7 18:39:02.901 2021] [27709] DEBUG: - -1 Change task (task 0x7fa0840019a0), fd=101 (101) 1623091143901154Us -> 1623091143901578Us  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: AsyncRecvNBChunk 28 bytes (0 requested)  
[Mon Jun  7 18:39:02.901 2021] [27709] DEBUG: - -1 DisableWrite enqueueing (task 0x7fa0840019a0) (1->2), innet=1  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG:       Probing revealed 28 bytes: SphinxAPI, usual byte order  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: conn 127.0.0.1:49386(462): got handshake, major v.1  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: conn 127.0.0.1:49386(462): loop start with timeout 5  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: read command 4, version 0, reply size 4  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: conn 127.0.0.1:49386(462): got command 4, handling  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: conn 127.0.0.1:49386(462): pconn is now on  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: AsyncSend 4 bytes  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: conn 127.0.0.1:49386(462): loop start with timeout 1  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: read command 9, version 256, reply size 4  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: AsyncSend 12 bytes  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: conn 127.0.0.1:49386(462): exiting  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: Destroying and closing sock=125  
[Mon Jun  7 18:39:02.901 2021] [27709] DEBUG: - -1 Delete task (task 0x7fa0840019a0), fd=101 (101) 1623091143901578Us  
[Mon Jun  7 18:39:02.901 2021] [27695] DEBUG: SockWrapper_c::Impl_c::~Impl_c (); sent 16, received 28  
[Mon Jun  7 18:39:02.901 2021] [27709] DEBUG: AgentConn 0x7fa1e80130b0 destroyed  
[Mon Jun  7 18:39:02.925 2021] [27706] DEBUG: got events=1, tick=2837, interrupted=0  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: EngageWaiterAndYield awake (m_iSock=115, events=1)  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: =========== AsyncRecvNBChunk 5 when read 4096 bytes  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: SyncSockRead: AsyncRecvNBChunk returned 5  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: DiscardProcessed(4) iPos=4->0, iLen=5->1  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: AsyncReadMySQLPacketHeader returned 1 len...  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: LoopClientMySQL command 14  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: AsyncSend 11 bytes  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: DiscardProcessed(1) iPos=1->0, iLen=1->0  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: Receiving command... 0 bytes in buf  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: =========== AsyncRecvNBChunk -1 when read 4096 bytes  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: AsyncRecvNBChunk -1 bytes (4 requested)  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: Still need to receive 4 bytes...  
[Mon Jun  7 18:39:02.925 2021] [27699] DEBUG: CoYieldWith (m_iEvent=17), timeout 899999989  
[Mon Jun  7 18:39:02.925 2021] [27706] DEBUG: got events=1, tick=2838, interrupted=0  
[Mon Jun  7 18:39:02.925 2021] [27706] DEBUG: 0x7fa08402a820 epoll 94 setup, ev=0x3221225473, op=3, sock=115  
[Mon Jun  7 18:39:02.928 2021] [27706] DEBUG: got events=1, tick=2839, interrupted=0  
[Mon Jun  7 18:39:02.928 2021] [27698] DEBUG: EngageWaiterAndYield awake (m_iSock=115, events=1)  
[Mon Jun  7 18:39:02.928 2021] [27698] DEBUG: =========== AsyncRecvNBChunk 327 when read 4096 bytes  

Operating system & version

All servers are the same configuration:
Debian GNU/Linux 10 (buster)

searchd -v  
Manticore 3.6.0 96d61d8bf@210504 release  
Copyright (c) 2001-2016, Andrew Aksyonoff  
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)  
Copyright (c) 2017-2021, Manticore Software LTD (https://manticoresearch.com)  

Installation done by package manager like:

https://manticoresearch.com/downloads/

apt-key adv --fetch-keys 'http://repo.manticoresearch.com/GPG-KEY-manticore'  
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb  
dpkg -i manticore-repo.noarch.deb  
apt update  
apt install manticore   
sanikolaev commented 3 years ago

What are the versions of the instances running on SERVER1:9312, SERVER2:9312, SERVER3:9312, SERVER4:9312 ? Can you make sure there's no old 32-bit version of searchd?

daikoz commented 3 years ago

I check all servers (searchd -v). All versions of search are same on all servers.

The issue is randomly. one SERVER is localhost (127.0.0.1) and I have something the same error (in this case searchd it is sure searchd is same version).

We have the same configuration on prod with sphinx search 2.2.11-2+b1, no issue.

First post updated:

Operating system & version

All servers are the same configuration: Debian GNU/Linux 10 (buster)

searchd -v
Manticore 3.6.0 96d61d8bf@210504 release
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2021, Manticore Software LTD (https://manticoresearch.com)

Installation done by package manager like:

https://manticoresearch.com/downloads/

apt-key adv --fetch-keys 'http://repo.manticoresearch.com/GPG-KEY-manticore'
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb
dpkg -i manticore-repo.noarch.deb
apt update
apt install manticore 
tomatolog commented 3 years ago

could you provide your config related to index from agent with index schema and one document for that index?

It could be master - agent issue related to index schema. Could you dump-header of the index from the agent - just to make sure what index version at agent.

daikoz commented 3 years ago

dump-header of the index from the agent

is it this that you want ?

mysql -h0 -P9306
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 682
Server version: 3.6.0 96d61d8bf@210504 release git branch HEAD (no branch)

same version on all servers

Database:

CREATE TABLE `SearchKeywords` (
  `InsertedDate` timestamp NOT NULL DEFAULT current_timestamp(),
  `Country` smallint(5) unsigned NOT NULL,
  `Keywords` varchar(200) NOT NULL,
  `Category` int(10) unsigned NOT NULL DEFAULT 0,
  `Sort` tinyint(3) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`Country`,`Keywords`,`Category`,`Sort`),
  KEY `IDX_SearchKeywords` (`Country`,`InsertedDate`),
  KEY `FK_SearchKeywords_Category_idx` (`Category`),
  KEY `SearchKeywords_InsertedDate_IDX` (`InsertedDate`) USING BTREE,
  CONSTRAINT `FK_SearchKeywords_Country` FOREIGN KEY (`Country`) REFERENCES `GeocalisationCountry` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO SearchKeywords (InsertedDate, Country, Keywords, Category, Sort) VALUES(current_timestamp(), 826, 'caravane c', 4, 0); INSERT INTO SearchKeywords (InsertedDate, Country, Keywords, Category, Sort) VALUES(current_timestamp(), 826, 'caravane d', 4, 0); INSERT INTO SearchKeywords (InsertedDate, Country, Keywords, Category, Sort) VALUES(current_timestamp(), 826, 'caravane e', 4, 0); INSERT INTO SearchKeywords (InsertedDate, Country, Keywords, Category, Sort) VALUES(current_timestamp(), 826, 'caravane f', 4, 0);

Follow the manticore.conf.

searchd
{
    listen = 9312
    listen = 9306:mysql

    pid_file = /var/run/manticore/searchd.pid
    binlog_path = # disable logging
    persistent_connections_limit = 30
    #mysql_version_string = 5.0.37

    log = /var/log/manticore/searchd.log
    #query_log = /var/log/manticore/query.log
    #query_log_format = sphinxql
}

indexer
{
    mem_limit = 2047M
}

source default
{
    type                = mysql

    sql_query_pre       = SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
    sql_query_pre       = SET SESSION query_cache_type=OFF
    sql_query_pre       = SET CHARACTER_SET_RESULTS=utf8
    sql_query_pre       = SET NAMES utf8
}

source dtv : default
{
    sql_host            = SQLHOST
    sql_port            = 3306
    sql_user            = USERNAME
    sql_pass            = PWD
    sql_db              = DBNAME
}

source SearchKeywords : dtv
{
    sql_query           = SELECT ROW_NUMBER() OVER(ORDER BY InsertedDate), UNIX_TIMESTAMP(InsertedDate) AS InsertedDateTS, Keywords, Country, Category \
                          FROM SearchKeywords \
                          WHERE InsertedDate > DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 6 MONTH) AND Sort = 0

    sql_attr_timestamp  = InsertedDateTS
    sql_field_string    = Keywords
    sql_attr_uint       = Country
    sql_attr_uint       = Category
}

index default
{
    charset_table   = non_cjk
    morphology      = libstemmer_fr
    stopwords       = /usr/share/manticore/stopwords/fr
}

index SearchKeywords : default
{
    source  = SearchKeywords
    path    = /var/lib/manticore/data/SearchKeywords
}

#####################
# Load Balancing
#####################

index SearchKeywordsLB
{
     type             = distributed

     agent_persistent = SERVER1:9312|SERVER2:9312|SERVER3:9312|SERVER4:9312:SearchKeywords
     ha_strategy      = nodeads
}

DESC TABLES

same on all servers


MySQL [(none)]> DESC  SearchKeywords;
+----------------+-----------+------------+
| Field          | Type      | Properties |
+----------------+-----------+------------+
| id             | bigint    |            |
| keywords       | text      | indexed    |
| inserteddatets | timestamp |            |
| keywords       | string    |            |
| country        | uint      |            |
| category       | uint      |            |
+----------------+-----------+------------+
6 rows in set (0.000 sec)
daikoz commented 3 years ago

If I comment the line in my manticore.conf:

persistent_connections_limit = 30

persistant connection is disable:

WARNING: index 'XXX': agent_persistent used, but no persistent_connections_limit defined. Fall back to non-persistent agent

and now, I cannot reproduce this issue.

I also try to increase persistent_connections_limit to 600: no change, issue is reproduced.

Thus, the issue is on the activation of persistent_connections_limit/agent_persistent.

Moreover, if i modify /etc/hosts to assign SERVER1, SERVER2, SERVER3 and SERVER4 to 127.0.0.1, i have the issue with persistent_connections_limit = 30

tomatolog commented 3 years ago

will try to reproduce issue with persist connection option

sanikolaev commented 3 years ago

reproduced

githubmanticore commented 1 year ago

➤ Sergey Nikolaev commented:

We can't reproduce it in the latest dev version. I'm closing this issue. Feel free to reopen in case you can reproduce it again.