humanlogio / humanlog

Logs for humans to read.
Apache License 2.0
735 stars 52 forks source link

Ollama logging Gin (Go framework) and multiline #123

Open jwsy opened 4 hours ago

jwsy commented 4 hours ago

Ollama logs look awesome in Humanlog but can get a few improvements

image

Logs attached: ollama_serve_output.log

The GIN logs look like they're not getting parsed

[GIN] 2024/10/26 - 00:20:44 | 200 |   14.036458ms |       127.0.0.1 | GET      "/api/tags"

Multiline logs aren't getting parsed either

time=2024-11-08T14:46:33.875+09:00 level=INFO source=server.go:606 msg="llama runner started in 3.52 seconds"
[GIN] 2024/11/08 - 14:46:38 | 200 |   7.94878925s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/11/08 - 14:46:39 | 200 |  994.447125ms |       127.0.0.1 | POST     "/v1/chat/completions"
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /Users/jyee/.ollama/models/blobs/sha256-e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 2
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,58980]   = ["▁ t", "i n", "e r", "▁ a", "h e...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv  23:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 3
llm_load_vocab: token to piece cache size = 0.1637 MB

Steps to reproduce

DO NOT :x: run Ollama like usual, be sure to run ollama with a custom OLLAMA_HOST like below so the webui can access it from within the container

OLLAMA_HOST=0.0.0.0 ollama serve &> ~/ollama_serve_output.log &

Then start Rancher Desktop and run Open WebUI

docker run -d -p 3000:8080 --add-host=rd.local:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

View the nice output in Humanlog

humanlog < ~/ollama_serve_output.log
aybabtme commented 3 hours ago

When this is fixed, would be good to let the folks know in https://www.reddit.com/r/ollama/comments/1drdvoh/how_do_i_hide_these_long_config_logs_and_gin/

aybabtme commented 3 hours ago

the logger that produces the [GIN] stuff is https://github.com/gin-gonic/gin/blob/master/logger.go