guinmoon / LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.
MIT License
1.05k stars 62 forks source link

The version downloaded via git crashes on the physical device iPhone 14 Plus #54

Closed MortalLien closed 2 months ago

MortalLien commented 3 months ago

I encountered a situation where my app runs smoothly on the iPhone 15 Pro Max simulator, but crashes on a physical device iPhone 14 Plus (OS: 17.4.1). However, the version downloaded from the App Store doesn't crash. I'm attaching the error displayed in my Xcode.


The following is the LOG popped up by Xcode

Logging Error: Failed to initialize logging system. Log messages may be missing. If this issue persists, try setting IDEPreferLogStreaming=YES in the active scheme actions environment variables. LLMFarm(848,0x1f5e9bf00) malloc: Unable to set up reclaim buffer (46) - disabling large cache Optional(file:///var/mobile/Containers/Data/Application/85998C75-26FA-4CE7-A986-9732E898084F/Documents/) false Optional(["time": "10:30 AM", "chat": "phi-2.Q4_K_M_1711344950.json", "model": "phi-2.Q4_K_M.gguf", "icon": "ava0", "title": "phi-2.Q4_K_M", "message": "llama ctx:1024", "mmodal": "0"])


Error Domain=NSCocoaErrorDomain Code=260 "The file “phi-2.Q4_K_M_1711344950.json.json” couldn’t be opened because there is no such file." UserInfo={NSFilePath=/var/mobile/Containers/Data/Application/85998C75-26FA-4CE7-A986-9732E898084F/Documents/history/phi-2.Q4_K_M_1711344950.json.json, NSUnderlyingError=0x301721140 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}

<0x105607cb0> Gesture: System gesture gate timed out. AI init llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /var/mobile/Containers/Data/Application/85998C75-26FA-4CE7-A986-9732E898084F/Documents/models/phi-2.Q4_K_M.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = phi2 llama_model_loader: - kv 1: str = Phi2 llama_model_loader: - kv 2: phi2.context_length u32 = 2048 llama_model_loader: - kv 3: phi2.embedding_length u32 = 2560 llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 10240 llama_model_loader: - kv 5: phi2.block_count u32 = 32 llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32 llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32 llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010 llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32 llama_model_loader: - kv 10: general.file_type u32 = 15 llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256 llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - type f32: 195 tensors llama_model_loader: - type q4_K: 81 tensors llama_model_loader: - type q5_K: 32 tensors llama_model_loader: - type q6_K: 17 tensors llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = phi2 llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 51200 llm_load_print_meta: n_merges = 50000 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2560 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 32 llm_load_print_meta: n_embd_head_k = 80 llm_load_print_meta: n_embd_head_v = 80 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 2560 llm_load_print_meta: n_embd_v_gqa = 2560 llm_load_print_meta: f_norm_eps = 1.0e-05 llm_load_print_meta: f_norm_rms_eps = 0.0e+00 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 10240 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 3B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 2.78 B llm_load_print_meta: model size = 1.66 GiB (5.14 BPW) llm_load_print_meta: = Phi2 llm_load_print_meta: BOS token = 50256 '<|endoftext|>' llm_load_print_meta: EOS token = 50256 '<|endoftext|>' llm_load_print_meta: UNK token = 50256 '<|endoftext|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_tensors: ggml ctx size = 0.12 MiB llm_load_tensors: offloading 0 repeating layers to GPU llm_load_tensors: offloaded 0/33 layers to GPU llm_load_tensors: CPU buffer size = 1704.63 MiB llama_new_context_with_model: n_ctx = 1024 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 320.00 MiB llama_new_context_with_model: KV self size = 320.00 MiB, K (f16): 160.00 MiB, V (f16): 160.00 MiB llama_new_context_with_model: CPU input buffer size = 7.01 MiB llama_new_context_with_model: CPU compute buffer size = 105.00 MiB llama_new_context_with_model: graph splits (measure): 1 %s: seed = %d 0 AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | Logits inited. ModelSampleParams(n_batch: 512, temp: 0.9, top_k: 40, top_p: 0.95, tfs_z: 1.0, typical_p: 1.0, repeat_penalty: 1.1, repeat_last_n: 64, frequence_penalty: 0.0, presence_penalty: 0.0, mirostat: 0, mirostat_tau: 5.0, mirostat_eta: 5.0, penalize_nl: true) ModelAndContextParams(model_inference: llmfarm_core.ModelInference.LLama_gguf, context: 1024, parts: -1, seed: 4294967295, n_threads: 6, lora_adapters: [], promptFormat: llmfarm_core.ModelPromptStyle.Custom, custom_prompt_format: "{{prompt}}", system_prompt: "", f16Kv: true, logitsAll: false, vocabOnly: false, useMlock: false, useMMap: true, embedding: false, processorsConunt: 6, use_metal: false, grammar_path: nil, add_bos_token: false, add_eos_token: false, parse_special_tokens: true, warm_prompt: "\n\n\n", reverse_prompt: [], clip_model: nil) Past token count: 0/1024 (0) Input tokens: [17250] After I type 'HI' and send it, the app crashes. ![IMG_3F6A4E868C98-1]( ![IMG_060838D4A041-1]( ![IMG_1916FF3AA68B-1](
guinmoon commented 3 months ago

Hi. It's very strange. I didn't make any comits after the release on the appstore. Should be the same version there and on git as of today. Give me time to think about it.