[bug]: "_pickle.UnpicklingError: pickle data was truncated"

Describe the bug Context chat queries are failing today with a 500 internal server error in the context_chat_backend docker container.

Could this be because the Docker container was restarted while an embedding task was running? I also had the issue with the latest groupfolders update yesterday that caused the fulltextsearch process to fail and trigger 500 errors for Nextcloud as a whole (see files_fulltextsearch issue #265). Could embedding corruption have occurred during this process?

To Reproduce Steps to reproduce the behavior:

Run context_chat query via Nextcloud Assistant
Trigger background process via cron
Context chat fails

Expected behavior This depends on the cause. If it is due to a docker container stop, then I would expect a graceful shutdown. If the embedding process fails due to another error (e.g. the fulltextsearch bug), then the embedding process should gracefully stop. If there is a corrupt pickle file, then I would expect an error recovery procedure.

Server logs (if applicable) NA

Context Chat Backend logs (if applicable, from the docker container)

See below

Screenshots See logs

Setup Details (please complete the following information):

Nextcloud Version: [Nextcloud Hub 8]() (29.0.4)
AppAPI Version: 3.1.0
Context Chat PHP Version 8.2.22
Context Chat Backend Version 3.0.1
Nextcloud deployment method: Docker latest stable
Context Chat Backend deployment method: manual, remote
chroma.sqlite3 is 175Gb (nextcloud file repository is TB+)

Additional context

docker logs context_chat_backend:

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from persistent_storage/model_files/dolphin-2.2.1-mistral-7b.Q5_K_M.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = ehartford_dolphin-2.2.1-mistral-7b llama_model_loader: - kv 2: llama.context_length u32 = 32768 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 17 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32002] = ["", "~~", "~~", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32002] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32002] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 32000 llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q5_K: 193 tensors llama_model_loader: - type q6_K: 33 tensors llm_load_vocab: special tokens cache size = 5 llm_load_vocab: token to piece cache size = 0.1637 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32002 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 32768 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 32768 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q5_K - Medium llm_load_print_meta: model params = 7.24 B llm_load_print_meta: model size = 4.78 GiB (5.67 BPW) llm_load_print_meta: general.name = ehartford_dolphin-2.2.1-mistral-7b llm_load_print_meta: BOS token = 1 '' llm_load_print_meta: EOS token = 32000 '<|im_end|>' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: PAD token = 0 '' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: EOT token = 32000 '<|im_end|>' llm_load_print_meta: max token length = 48 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: Tesla P40, compute capability 6.1, VMM: yes Device 1: Tesla P40, compute capability 6.1, VMM: yes llm_load_tensors: ggml ctx size = 0.41 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CPU buffer size = 85.94 MiB llm_load_tensors: CUDA0 buffer size = 2495.28 MiB llm_load_tensors: CUDA1 buffer size = 2311.78 MiB ................................................................................................... llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA0 KV buffer size = 544.00 MiB llama_kv_cache_init: CUDA1 KV buffer size = 480.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB llama_new_context_with_model: pipeline parallelism enabled (n_copies=4) llama_new_context_with_model: CUDA0 compute buffer size = 640.01 MiB llama_new_context_with_model: CUDA1 compute buffer size = 640.02 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 72.02 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 3 AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | Model metadata: {'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.eos_token_id': '32000', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'ehartford_dolphin-2.2.1-mistral-7b', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '17'} Using fallback chat format: llama-2 TRACE: 192.168.0.73:52868 - ASGI [23] Send {'type': 'http.response.start', 'status': 500, 'headers': '<...>'} INFO: 192.168.0.73:52868 - "POST /query HTTP/1.1" 500 Internal Server Error TRACE: 192.168.0.73:52868 - ASGI [23] Send {'type': 'http.response.body', 'body': '<6386 bytes>'} TRACE: 192.168.0.73:52868 - ASGI [23] Raised exception ERROR: Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/uvicorn/protocols/http/h11_impl.py", line 396, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/message_logger.py", line 84, in call raise exc from None File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/message_logger.py", line 80, in call await self.app(scope, inner_receive, inner_send) File "/usr/local/lib/python3.11/dist-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/app/context_chat_backend/ocs_utils.py", line 75, in call await self.app(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 193, in run_endpoint_function return await run_in_threadpool(dependant.call, values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/starlette/concurrency.py", line 42, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/utils.py", line 74, in wrapper return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chatbackend/controller.py", line 325, in return process_context_query( ^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/chain/one_shot.py", line 65, in process_context_query context_docs = get_context_docs(user_id, query, vectordb, ctx_limit, scope_type, scope_list) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/chain/context.py", line 37, in get_context_docs context_docs = user_client.similarity_search(query, k=ctx_limit, filter=ctx_filter) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 349, in similarity_search docs_and_scores = self.similarity_search_with_score( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 439, in similarity_search_with_score results = self.__query_collection( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain_core/utils/utils.py", line 36, in wrapper return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 156, in query_collection return self._collection.query( ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/api/models/Collection.py", line 195, in query query_results = self._client._query( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/telemetry/opentelemetry/init.py", line 146, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/rate_limiting/init__.py", line 47, in wrapper return f(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/api/segment.py", line 737, in _query vector_reader = self._manager.get_segment(collection_id, VectorReader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/telemetry/opentelemetry/init.py", line 146, in wrapper return f(args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/manager/local.py", line 217, in get_segment instance = self._instance(segment) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/manager/local.py", line 246, in _instance instance = cls(self._system, segment) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 107, in init self._persist_data = PersistentData.load_from_file( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 70, in load_from_file ret = cast(PersistentData, pickle.load(f)) ^^^^^^^^^^^^^^ _pickle.UnpicklingError: pickle data was truncated TRACE: 192.168.0.73:52868 - HTTP connection lost

nextcloud / context_chat_backend

[bug]: "_pickle.UnpicklingError: pickle data was truncated" #65