nextcloud / context_chat_backend

GNU Affero General Public License v3.0
5 stars 5 forks source link

[bug]: "_pickle.UnpicklingError: pickle data was truncated" #65

Open ga-it opened 3 weeks ago

ga-it commented 3 weeks ago

Describe the bug Context chat queries are failing today with a 500 internal server error in the context_chat_backend docker container.

Could this be because the Docker container was restarted while an embedding task was running? I also had the issue with the latest groupfolders update yesterday that caused the fulltextsearch process to fail and trigger 500 errors for Nextcloud as a whole (see files_fulltextsearch issue #265). Could embedding corruption have occurred during this process?

To Reproduce Steps to reproduce the behavior:

  1. Run context_chat query via Nextcloud Assistant
  2. Trigger background process via cron
  3. Context chat fails

Expected behavior This depends on the cause. If it is due to a docker container stop, then I would expect a graceful shutdown. If the embedding process fails due to another error (e.g. the fulltextsearch bug), then the embedding process should gracefully stop. If there is a corrupt pickle file, then I would expect an error recovery procedure.

Server logs (if applicable) NA

Context Chat Backend logs (if applicable, from the docker container)

See below

Screenshots See logs

Setup Details (please complete the following information):

Additional context

docker logs context_chat_backend:

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from persistent_storage/model_files/dolphin-2.2.1-mistral-7b.Q5_K_M.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = ehartford_dolphin-2.2.1-mistral-7b llama_model_loader: - kv 2: llama.context_length u32 = 32768 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 17 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32002] = ["", "", "", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32002] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32002] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 32000 llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q5_K: 193 tensors llama_model_loader: - type q6_K: 33 tensors llm_load_vocab: special tokens cache size = 5 llm_load_vocab: token to piece cache size = 0.1637 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32002 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 32768 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 32768 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q5_K - Medium llm_load_print_meta: model params = 7.24 B llm_load_print_meta: model size = 4.78 GiB (5.67 BPW) llm_load_print_meta: general.name = ehartford_dolphin-2.2.1-mistral-7b llm_load_print_meta: BOS token = 1 '' llm_load_print_meta: EOS token = 32000 '<|im_end|>' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: PAD token = 0 '' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: EOT token = 32000 '<|im_end|>' llm_load_print_meta: max token length = 48 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: Tesla P40, compute capability 6.1, VMM: yes Device 1: Tesla P40, compute capability 6.1, VMM: yes llm_load_tensors: ggml ctx size = 0.41 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CPU buffer size = 85.94 MiB llm_load_tensors: CUDA0 buffer size = 2495.28 MiB llm_load_tensors: CUDA1 buffer size = 2311.78 MiB ................................................................................................... llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA0 KV buffer size = 544.00 MiB llama_kv_cache_init: CUDA1 KV buffer size = 480.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB llama_new_context_with_model: pipeline parallelism enabled (n_copies=4) llama_new_context_with_model: CUDA0 compute buffer size = 640.01 MiB llama_new_context_with_model: CUDA1 compute buffer size = 640.02 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 72.02 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 3 AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | Model metadata: {'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.eos_token_id': '32000', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'ehartford_dolphin-2.2.1-mistral-7b', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '17'} Using fallback chat format: llama-2 TRACE: 192.168.0.73:52868 - ASGI [23] Send {'type': 'http.response.start', 'status': 500, 'headers': '<...>'} INFO: 192.168.0.73:52868 - "POST /query HTTP/1.1" 500 Internal Server Error TRACE: 192.168.0.73:52868 - ASGI [23] Send {'type': 'http.response.body', 'body': '<6386 bytes>'} TRACE: 192.168.0.73:52868 - ASGI [23] Raised exception ERROR: Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/uvicorn/protocols/http/h11_impl.py", line 396, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/message_logger.py", line 84, in call raise exc from None File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/message_logger.py", line 80, in call await self.app(scope, inner_receive, inner_send) File "/usr/local/lib/python3.11/dist-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/app/context_chat_backend/ocs_utils.py", line 75, in call await self.app(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 193, in run_endpoint_function return await run_in_threadpool(dependant.call, values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/starlette/concurrency.py", line 42, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/utils.py", line 74, in wrapper return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chatbackend/controller.py", line 325, in return process_context_query( ^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/chain/one_shot.py", line 65, in process_context_query context_docs = get_context_docs(user_id, query, vectordb, ctx_limit, scope_type, scope_list) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/chain/context.py", line 37, in get_context_docs context_docs = user_client.similarity_search(query, k=ctx_limit, filter=ctx_filter) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 349, in similarity_search docs_and_scores = self.similarity_search_with_score( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 439, in similarity_search_with_score results = self.__query_collection( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain_core/utils/utils.py", line 36, in wrapper return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 156, in query_collection return self._collection.query( ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/api/models/Collection.py", line 195, in query query_results = self._client._query( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/telemetry/opentelemetry/init.py", line 146, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/rate_limiting/init__.py", line 47, in wrapper return f(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/api/segment.py", line 737, in _query vector_reader = self._manager.get_segment(collection_id, VectorReader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/telemetry/opentelemetry/init.py", line 146, in wrapper return f(args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/manager/local.py", line 217, in get_segment instance = self._instance(segment) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/manager/local.py", line 246, in _instance instance = cls(self._system, segment) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 107, in init self._persist_data = PersistentData.load_from_file( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 70, in load_from_file ret = cast(PersistentData, pickle.load(f)) ^^^^^^^^^^^^^^ _pickle.UnpicklingError: pickle data was truncated TRACE: 192.168.0.73:52868 - HTTP connection lost

ga-it commented 3 weeks ago

Errors in the ingestion process can lead to unsynchronised or corrupt data.

Causes of errors could be:

  1. CUDA tensor issues
  2. File decoding errors
  3. Text splitting problems
  4. Network connectivity issues during database interaction
  5. Database-specific errors during document insertion
  6. Memory limitations, especially when handling large files
  7. Timeouts
  8. Process / docker container halts / crashes
  9. Nextcloud server errors during embedding process

Results could be:

  1. Partially added documents in the database
  2. Inconsistent or missing metadata
  3. Corrupted embedding data
  4. Unsynchronized data between the internal dictionary/lists and the vector database

The fix would seem to require a catch and rollback in the injest.py file and _process_sources function.

Additionally it would seem that a repair script is required to remove past corrupt ingestion or this crashes the query function. This should identify:

  1. Missing Embeddings: Check if any documents in the database have missing embeddings. You can query the database for documents where the embedding field is null or empty.
  2. Metadata Mismatches: Compare the metadata (filename, timestamp, etc.) in the database with the actual files in your storage. Look for inconsistencies, missing metadata, or outdated timestamps.
  3. Orphaned Embeddings: Identify embeddings in the database that no longer correspond to any existing file in your storage. This might indicate that a file was deleted without properly removing its associated data from the database.
kyteinsky commented 2 weeks ago

Hi, all the data is first received through the network and processed before being added to the vector db. It seems the plug was pulled right when it was doing so. A graceful shutdown at this point would be a feature request. Please open a new issue for that.

From #66, it seems you mount a folder for the persistent directory so let's use that local folder.

  1. shut down context_chat_backend
  2. cd /data/context_chat_backend/persistent_storage/vector_db_data
  3. sqlite3 chroma.sqlite3
  4. check if the db looks good using commands: .tables, select * from collections;, select * from embeddings; and select * from embedding_metadata;
  5. make a backup of the entire "vector_db_data" folder if all is good
  6. sqlite3 /path/to/db/chroma.sqlite3 "select s.id, c.name from segments s join collections c on s.collection=c.id where s.scope='VECTOR';". This should give you all the ids which should be the folder names in "vector_dbdata" folder. Delete those folders and take a note of the collection names `Vector`something. Instructions are from here: https://cookbook.chromadb.dev/strategies/rebuilding/#rebuilding-a-collection
  7. start context_chat_backend to build the index

I have not tested this personally but the docs should not be lying. Having a backup will be vital here.

ga-it commented 1 week ago

I appreciate the above and need for regular overnight backups.

I will open a feature request. I would think there need to be some sort of journal with a rollback of incomplete jobs on restart if required.