Closed jsalts closed 2 months ago
Hmm, we did run into this scenario in earlier RC builds of 0.25.0 and we addressed several issues related OpenAI availability issues... May be we missed a spot.
Could you share all the logs from the time when you restarted the Typesense process?
This is all I saw. It gets stuck in the 'Running GC for aborted requests' loop for a few hours then restarts overnight by itself. it finally worked when OpenAi came back online. I suppose there's also no direct evidence openAI is involved other than the collection that wasn't loading being OpenAi embeddings based.
2023-10-20T01:18:07.408771791Z I20231020 01:18:07.408672 1 typesense_server_utils.cpp:331] Starting Typesense 0.25.2.rc6
2023-10-20T01:18:07.408793921Z I20231020 01:18:07.408702 1 typesense_server_utils.cpp:334] Typesense is using jemalloc.
2023-10-20T01:18:07.409318756Z I20231020 01:18:07.409235 1 typesense_server_utils.cpp:384] Thread pool size: 192
2023-10-20T01:18:07.433301422Z I20231020 01:18:07.433161 1 store.h:64] Initializing DB by opening state dir: /data/db
2023-10-20T01:18:07.602221315Z I20231020 01:18:07.602072 1 store.h:64] Initializing DB by opening state dir: /data/meta
2023-10-20T01:18:07.671296627Z I20231020 01:18:07.671178 1 ratelimit_manager.cpp:546] Loaded 0 rate limit rules.
2023-10-20T01:18:07.671338206Z I20231020 01:18:07.671205 1 ratelimit_manager.cpp:547] Loaded 0 rate limit bans.
2023-10-20T01:18:07.672011956Z I20231020 01:18:07.671921 1 typesense_server_utils.cpp:495] Starting API service...
2023-10-20T01:18:07.672160668Z I20231020 01:18:07.672037 648 batched_indexer.cpp:124] Starting batch indexer with 192 threads.
2023-10-20T01:18:07.672190968Z I20231020 01:18:07.672041 647 typesense_server_utils.cpp:232] Since no --nodes argument is provided, starting a single node Typesense cluster.
2023-10-20T01:18:07.672194488Z I20231020 01:18:07.672101 1 http_server.cpp:178] Typesense has started listening on port 8108
2023-10-20T01:18:07.681730219Z I20231020 01:18:07.681609 647 server.cpp:1107] Server[braft::RaftStatImpl+braft::FileServiceImpl+braft::RaftServiceImpl+braft::CliServiceImpl] is serving on port=8107.
2023-10-20T01:18:07.681925971Z I20231020 01:18:07.681646 647 server.cpp:1110] Check out http://4cbdfcdb1cae:8107 in web browser.
2023-10-20T01:18:07.681968810Z I20231020 01:18:07.681914 647 raft_server.cpp:68] Nodes configuration: 172.18.0.2:8107:8108
2023-10-20T01:18:07.684540480Z I20231020 01:18:07.684453 648 batched_indexer.cpp:129] BatchedIndexer skip_index: -9999
2023-10-20T01:18:07.686767047Z I20231020 01:18:07.686689 647 log.cpp:690] Use murmurhash32 as the checksum type of appending entries
2023-10-20T01:18:07.688518889Z I20231020 01:18:07.688452 647 log.cpp:1172] log load_meta /data/state/log/log_meta first_log_index: 190577 time: 1736
2023-10-20T01:18:07.689813276Z I20231020 01:18:07.689745 647 log.cpp:1112] load open segment, path: /data/state/log first_index: 190425
2023-10-20T01:18:07.708946319Z I20231020 01:18:07.708814 666 raft_server.cpp:529] on_snapshot_load
2023-10-20T01:18:07.867927608Z I20231020 01:18:07.867751 666 store.h:299] rm /data/db success
2023-10-20T01:18:08.108788722Z I20231020 01:18:08.108649 666 store.h:309] copy snapshot /data/state/snapshot/snapshot_00000000000000190577/db_snapshot to /data/db success
2023-10-20T01:18:08.109511794Z I20231020 01:18:08.109411 666 store.h:64] Initializing DB by opening state dir: /data/db
2023-10-20T01:18:08.233064712Z I20231020 01:18:08.232915 666 store.h:323] DB open success!
2023-10-20T01:18:08.233112962Z I20231020 01:18:08.232950 666 raft_server.cpp:508] Loading collections from disk...
2023-10-20T01:18:08.233116722Z I20231020 01:18:08.232960 666 collection_manager.cpp:187] CollectionManager::load()
2023-10-20T01:18:08.235454608Z I20231020 01:18:08.235312 666 auth_manager.cpp:34] Indexing 0 API key(s) found on disk.
2023-10-20T01:18:08.235510218Z I20231020 01:18:08.235356 666 collection_manager.cpp:207] Loading upto 96 collections in parallel, 1000 documents at a time.
2023-10-20T01:18:08.235515368Z I20231020 01:18:08.235385 666 collection_manager.cpp:216] Found 3 collection(s) on disk.
2023-10-20T01:18:08.240631736Z I20231020 01:18:08.240458 884 collection_manager.cpp:137] Found collection strings with 4 memory shards.
2023-10-20T01:18:08.240677986Z I20231020 01:18:08.240481 883 collection_manager.cpp:137] Found collection games_metadata with 4 memory shards.
2023-10-20T01:18:08.240692826Z I20231020 01:18:08.240471 885 text_embedder_manager.cpp:13] Validating and initializing remote model: openai/text-embedding-ada-002
2023-10-20T01:18:08.240695176Z E20231020 01:18:08.240545 885 raft_server.cpp:973] Could not get leader url as node is not initialized!
2023-10-20T01:18:08.241996510Z I20231020 01:18:08.241889 884 collection_manager.cpp:1341] Loading collection strings
2023-10-20T01:18:08.244446857Z I20231020 01:18:08.244328 883 collection_manager.cpp:1341] Loading collection games_metadata
2023-10-20T01:18:08.660588844Z E20231020 01:18:08.660450 885 raft_server.cpp:973] Could not get leader url as node is not initialized!
2023-10-20T01:18:09.223153134Z E20231020 01:18:09.222995 885 http_proxy.cpp:75] Proxy call failed, status_code: 502, timeout_ms: 60000, try: 1, num_try: 2
2023-10-20T01:18:09.520472171Z E20231020 01:18:09.520327 885 http_proxy.cpp:75] Proxy call failed, status_code: 502, timeout_ms: 60000, try: 2, num_try: 2
2023-10-20T01:19:00.775669743Z I20231020 01:19:00.775506 884 collection_manager.cpp:1448] Loaded 32768 documents from strings so far.
2023-10-20T01:19:02.481384856Z I20231020 01:19:02.481215 883 collection_manager.cpp:1459] Indexed 8871/8871 documents into collection games_metadata
2023-10-20T01:19:02.481434895Z I20231020 01:19:02.481279 883 collection_manager.cpp:255] Loaded 1 collection(s) so far
2023-10-20T01:19:08.693822979Z I20231020 01:19:08.693536 648 batched_indexer.cpp:285] Running GC for aborted requests, req map size: 0
2023-10-20T01:19:52.651272947Z I20231020 01:19:52.651120 884 collection_manager.cpp:1448] Loaded 65536 documents from strings so far.
2023-10-20T01:20:08.353378793Z I20231020 01:20:08.353202 884 collection_manager.cpp:1459] Indexed 74387/74387 documents into collection strings
2023-10-20T01:20:08.353427313Z I20231020 01:20:08.353276 884 collection_manager.cpp:255] Loaded 2 collection(s) so far
2023-10-20T01:20:09.699325325Z I20231020 01:20:09.699157 648 batched_indexer.cpp:285] Running GC for aborted requests, req map size: 0
2023-10-20T01:21:10.705222422Z I20231020 01:21:10.705026 648 batched_indexer.cpp:285] Running GC for aborted requests, req map size: 0
2023-10-20T01:22:11.710983556Z I20231020 01:22:11.710799 648 batched_indexer.cpp:285] Running GC for aborted requests, req map size: 0
2023-10-20T01:23:12.717189999Z I20231020 01:23:12.717017 648 batched_indexer.cpp:285] Running GC for aborted requests, req map size: 0
2023-10-20T01:24:13.723272485Z I20231020 01:24:13.723096 648 batched_indexer.cpp:285] Running GC for aborted requests, req map size: 0
2023-10-20T01:25:14.729501129Z I20231020 01:25:14.729338 648 batched_indexer.cpp:285] Running GC for aborted requests, req map size: 0
2023-10-20T01:26:15.736289880Z I20231020 01:26:15.736106 648 batched_indexer.cpp:285] Running GC for aborted requests, req map size: 0```
ideally, queries against these collections should still work but inserts/updates should fail
💯
Not sure if this is the same issue, but you can reproduce search-related problem on 26.0
by setting custom url
in model_config
, indexing some data openai-compatible API/wrapper and then killing the API and doing the search that does query_by: <your embedding column>
:
logs from docker compose-based experiments (set remote_embedding_timeout_ms
to 10
)
typesense-1 | E20240707 09:49:03.589577 134 http_client.cpp:194] CURL timeout. Time taken: 0.031921, method: POST, url: http://host.docker.internal:8082/v1/embeddings
typesense-1 | E20240707 09:49:03.589903 134 http_proxy.cpp:85] Proxy call failed, status_code: 408, timeout_ms: 10, try: 1, num_try: 2
typesense-1 | E20240707 09:49:03.597003 134 http_client.cpp:197] CURL failed. Code: 7, strerror: Couldn't connect to server, method: POST, url: http://host.docker.internal:8082/v1/embeddings
typesense-1 | E20240707 09:49:03.597097 134 http_proxy.cpp:85] Proxy call failed, status_code: 500, timeout_ms: 10, try: 2, num_try: 2
curl request to typesense multi_search hangs forever in another (terminal) tab - I would expect it would fail or ideally fallback to text search when doing a hybrid search
by looking at the logs alone I suspect there is some problem with distinguishing connection timeout vs some other higher-level networking problem? 🤔
This is fixed in 27.0.rc26
Description
OpenAi downtime can render a database inoperable. I haven't found any logs confirming this, but OpenAi API is currently down and the very small collection I have that uses an OpenAI API key is the only one not loading. I have no idea how I could even delete the collection to recover the rest of the collections since the database is in a non-ready state and API calls are being refused.
Steps to reproduce
Expected Behavior
Actual Behavior
Metadata
Typesense Version: 0.25.2.rc6
OS: Windows / WSL / Ubuntu