guinmoon / LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.
https://llmfarm.site
MIT License
1.05k stars 62 forks source link

The version downloaded via git crashes on the physical device iPhone 14 Plus #54

Closed MortalLien closed 2 months ago

MortalLien commented 3 months ago

I encountered a situation where my app runs smoothly on the iPhone 15 Pro Max simulator, but crashes on a physical device iPhone 14 Plus (OS: 17.4.1). However, the version downloaded from the App Store doesn't crash. I'm attaching the error displayed in my Xcode.

crash_info

The following is the LOG popped up by Xcode

Logging Error: Failed to initialize logging system. Log messages may be missing. If this issue persists, try setting IDEPreferLogStreaming=YES in the active scheme actions environment variables. LLMFarm(848,0x1f5e9bf00) malloc: Unable to set up reclaim buffer (46) - disabling large cache Optional(file:///var/mobile/Containers/Data/Application/85998C75-26FA-4CE7-A986-9732E898084F/Documents/) false Optional(["time": "10:30 AM", "chat": "phi-2.Q4_K_M_1711344950.json", "model": "phi-2.Q4_K_M.gguf", "icon": "ava0", "title": "phi-2.Q4_K_M", "message": "llama ctx:1024", "mmodal": "0"])

reload

Error Domain=NSCocoaErrorDomain Code=260 "The file “phi-2.Q4_K_M_1711344950.json.json” couldn’t be opened because there is no such file." UserInfo={NSFilePath=/var/mobile/Containers/Data/Application/85998C75-26FA-4CE7-A986-9732E898084F/Documents/history/phi-2.Q4_K_M_1711344950.json.json, NSUnderlyingError=0x301721140 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}

<0x105607cb0> Gesture: System gesture gate timed out. AI init llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /var/mobile/Containers/Data/Application/85998C75-26FA-4CE7-A986-9732E898084F/Documents/models/phi-2.Q4_K_M.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = phi2 llama_model_loader: - kv 1: general.name str = Phi2 llama_model_loader: - kv 2: phi2.context_length u32 = 2048 llama_model_loader: - kv 3: phi2.embedding_length u32 = 2560 llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 10240 llama_model_loader: - kv 5: phi2.block_count u32 = 32 llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32 llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32 llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010 llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32 llama_model_loader: - kv 10: general.file_type u32 = 15 llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256 llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - type f32: 195 tensors llama_model_loader: - type q4_K: 81 tensors llama_model_loader: - type q5_K: 32 tensors llama_model_loader: - type q6_K: 17 tensors llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = phi2 llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 51200 llm_load_print_meta: n_merges = 50000 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2560 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 32 llm_load_print_meta: n_embd_head_k = 80 llm_load_print_meta: n_embd_head_v = 80 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 2560 llm_load_print_meta: n_embd_v_gqa = 2560 llm_load_print_meta: f_norm_eps = 1.0e-05 llm_load_print_meta: f_norm_rms_eps = 0.0e+00 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 10240 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 3B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 2.78 B llm_load_print_meta: model size = 1.66 GiB (5.14 BPW) llm_load_print_meta: general.name = Phi2 llm_load_print_meta: BOS token = 50256 '<|endoftext|>' llm_load_print_meta: EOS token = 50256 '<|endoftext|>' llm_load_print_meta: UNK token = 50256 '<|endoftext|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_tensors: ggml ctx size = 0.12 MiB llm_load_tensors: offloading 0 repeating layers to GPU llm_load_tensors: offloaded 0/33 layers to GPU llm_load_tensors: CPU buffer size = 1704.63 MiB llama_new_context_with_model: n_ctx = 1024 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 320.00 MiB llama_new_context_with_model: KV self size = 320.00 MiB, K (f16): 160.00 MiB, V (f16): 160.00 MiB llama_new_context_with_model: CPU input buffer size = 7.01 MiB llama_new_context_with_model: CPU compute buffer size = 105.00 MiB llama_new_context_with_model: graph splits (measure): 1 %s: seed = %d 0 AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | Logits inited. ModelSampleParams(n_batch: 512, temp: 0.9, top_k: 40, top_p: 0.95, tfs_z: 1.0, typical_p: 1.0, repeat_penalty: 1.1, repeat_last_n: 64, frequence_penalty: 0.0, presence_penalty: 0.0, mirostat: 0, mirostat_tau: 5.0, mirostat_eta: 5.0, penalize_nl: true) ModelAndContextParams(model_inference: llmfarm_core.ModelInference.LLama_gguf, context: 1024, parts: -1, seed: 4294967295, n_threads: 6, lora_adapters: [], promptFormat: llmfarm_core.ModelPromptStyle.Custom, custom_prompt_format: "{{prompt}}", system_prompt: "", f16Kv: true, logitsAll: false, vocabOnly: false, useMlock: false, useMMap: true, embedding: false, processorsConunt: 6, use_metal: false, grammar_path: nil, add_bos_token: false, add_eos_token: false, parse_special_tokens: true, warm_prompt: "\n\n\n", reverse_prompt: [], clip_model: nil) Past token count: 0/1024 (0) Input tokens: [17250] After I type 'HI' and send it, the app crashes. ![IMG_3F6A4E868C98-1](https://github.com/guinmoon/LLMFarm/assets/20417280/46aec20f-705d-45c3-853a-1e59a775a660) ![IMG_060838D4A041-1](https://github.com/guinmoon/LLMFarm/assets/20417280/3e9b00b4-e28d-4a77-a619-121d46a27b3c) ![IMG_1916FF3AA68B-1](https://github.com/guinmoon/LLMFarm/assets/20417280/9fe0e626-994c-45c6-b8ae-24bfbb9823f4)
guinmoon commented 3 months ago

Hi. It's very strange. I didn't make any comits after the release on the appstore. Should be the same version there and on git as of today. Give me time to think about it.