ollama / ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
https://ollama.com
MIT License
98.21k stars 7.82k forks source link

reader-lm - heavy hallucinations? #6875

Open MeinDeutschkurs opened 1 month ago

MeinDeutschkurs commented 1 month ago

What is the issue?

Heavy hallucinations.

Is this normal for the model, or does it not know when to stop?

(base) jophi@10 flux-web % ollama run reader-lm
pulling manifest 
pulling b730ecb20584... 100% ▕████████████████▏ 934 MB                         
pulling 94baea11f788... 100% ▕████████████████▏  13 KB                         
pulling 60ed67c565f8... 100% ▕████████████████▏  506 B                         
pulling f02dd72bb242... 100% ▕████████████████▏   59 B                         
pulling 1b6ce7ace735... 100% ▕████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
success 
>>> <html><body><h1>Hallo Welt</h1><p>Schön, dass du hier bist</p></body></html>
... 
Hallo Welt
-----------

Schön, dass du hier bist

Hello World!
------------

Thank you for visiting my website!

My name is Markus. I am a web designer and developer. I have been doing 
this for more than 10 years.

I love coding in PHP, CSS and JavaScript.

Here are some of the projects that I've done:

![Image 1](https://www.markus-mark.com/img/projects/pic2.png)

And here is another one:

![Image 2](https://www.markus-mark.com/img/projects/pic3.jpg)

I have a passion for learning new things and experimenting with different 
technologies.

If you want to learn more about my work, I have created the following 
pages:

[Design](http://markus-mark.com/design/)

[Development](http://markus-mark.com/development/)

[Ideas](http://markus-mark.com/ideas/)

I hope you enjoy my website!

Thank you again for visiting.

Best regards,

Markus

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.3.11

MeinDeutschkurs commented 1 month ago

maybe this will help https://bit.ly/4evXrQ6 I put the necessary dlls in the archive

Pardon me? This does not explain anything. I do not download anything that is not documented on ollama.com

Trung0246 commented 1 month ago

maybe this will help https://bit.ly/4evXrQ6 I put the necessary dlls in the archive

Pardon me? This does not explain anything. I do not download anything that is not documented on ollama.com

I also got similar comment on another repo. Looks like coordinated attacks. Just report the account and move on.

https://www.virustotal.com/gui/url/9fb15e5631759088f029d701d36d3de48c6460f2b3ba2f893df4200d4a11c5d4?nocache=1

rick-github commented 1 month ago

I ran it on intel/nvidia linux system and it performed as expected:

$ ollama run reader-lm
pulling manifest 
pulling b730ecb20584... 100% ▕█████████████████▏ 934 MB                         
pulling 94baea11f788... 100% ▕█████████████████▏  13 KB                         
pulling 60ed67c565f8... 100% ▕█████████████████▏  506 B                         
pulling f02dd72bb242... 100% ▕█████████████████▏   59 B                         
pulling 1b6ce7ace735... 100% ▕█████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
success 
>>> <html><body><h1>Hallo Welt</h1><p>Schön, dass du hier bist</p></body></html>
Hallo Welt
-----------

Schön, dass du hier bist

>>> Send a message (/? for help)
MeinDeutschkurs commented 1 month ago

I ran it on intel/nvidia linux system and it performed as expected:


$ ollama run reader-lm

pulling manifest 

pulling b730ecb20584... 100% ▕█████████████████▏ 934 MB                         

pulling 94baea11f788... 100% ▕█████████████████▏  13 KB                         

pulling 60ed67c565f8... 100% ▕█████████████████▏  506 B                         

pulling f02dd72bb242... 100% ▕█████████████████▏   59 B                         

pulling 1b6ce7ace735... 100% ▕█████████████████▏  485 B                         

verifying sha256 digest 

writing manifest 

success 

>>> <html><body><h1>Hallo Welt</h1><p>Schön, dass du hier bist</p></body></html>

Hallo Welt

-----------

Schön, dass du hier bist

>>> Send a message (/? for help)

Good for you. Maybe this is specific to the macOS Apple silicon version?

rick-github commented 1 month ago

If you can share server logs there might be something relevant.

MeinDeutschkurs commented 1 month ago

Thx for the link.

In the meantime, I restarted the device several times. It seems to be that it stopped hallucinating, like described above. 🤔

I think this can be closed. I will open a new thread if there are future troubles.

drdaeman commented 1 month ago

Happens to me. About 40% of the time model hallucinates something that wasn't in the input. Here's a simple test script:

#!/usr/bin/env python
import re
import subprocess

success_count = 0

for i in range(100):
    try:
        result = subprocess.run(
            ['ollama', 'run', 'reader-lm'], 
            input='<html><body><h1>Hallo Welt</h1><p>Schön, dass du hier bist</p></body></html>',
            text=True, capture_output=True, timeout=5
        )
        stripped_output = re.sub(r'\s+', ' ', re.sub(r'[^\w\s]', '', result.stdout, flags=re.UNICODE)).strip().lower()
        print(f"[{i}] '{stripped_output}'")
        if stripped_output == "hallo welt schön dass du hier bist":
            success_count += 1
    except subprocess.TimeoutExpired:
        print(f"[{i}] Timed out")

print(f"Success count: {success_count}")

(5 seconds timeout is more than enough to produce a response on my system - YMMV, tweak accordingly. Timeouts are against the situations where ollama just has runaway output that goes on seemingly forever.)

My results (Ollama 0.3.10 from today's nixpkgs-unstable, running on NixOS GNU/Linux with a single nVidia RTX 3090 GPU):

$ ollama --version
ollama version is 0.3.10
$ ollama ps
NAME                    ID              SIZE    PROCESSOR       UNTIL
reader-lm:latest        33da2b9e0afe    2.0 GB  100% GPU        9 minutes from now
$ python test-ollama-reader-lm.py
[0] 'hallo welt schön dass du hier bist'
[1] 'hallo welt schön dass du hier bist schon bestätigt und ausgelesen sie wurde auch von der seite geladen die url wurde getestet die seite wird geladen die seite wurde geladen the page was loaded check the source code of this page in your browser if any error is shown below es gibt keine fehler gefunden'
[2] 'hallo welt schön dass du hier bist'
[3] 'hallo welt schön dass du hier bist httpsdanielseiterde image 1 daniela seiter webdesign und fotografiehttpsdaniaseiterdewpcontentuploads202403logoseiterwebsitepnghttpsdaniaseiterde'
[4] 'hallo welt schön dass du hier bist hello world welcome to the site'
[5] 'hallo welt schön dass du hier bist'
[6] 'hallo welt schön dass du hier bist'
[7] 'hallo welt schön dass du hier bist'
[8] 'hallo welt schön dass du hier bist'
[9] 'hallo welt schön dass du hier bist'
[10] 'hallo welt schön dass du hier bist'
[11] 'hallo welt schön dass du hier bist'
[12] 'hallo welt schön dass du hier bist'
[13] 'hallo welt schön dass du hier bist'
[14] 'hallo welt schön dass du hier bist hello world hello world hello world its the first line of a web page'
[15] 'hallo welt schön dass du hier bist httpswwwgooglecom'
[16] 'hallo welt schön dass du hier bist'
[17] 'hallo welt schön dass du hier bist image 1httpss3amazonawscomfileembed5426754267_0jpghttpsfileembedwebsite homehttpsfileembedwebsite abouthttpsfileembedwebsiteabout contact ushttpsfileembedwebsitecontactus facebook like us on facebookhttpswwwfacebookcomfileembedwebsite image 2 google translatehttpstranslategooglecomimageszhcniconhelpbottom16pnghttpstranslategooglecom httpstwittercomfile_embed_website httpswwwinstagramcomfileembed_website 2024 file embed image 3 file embedhttpsfileembedwebsiteimageslogosvg'
[18] 'hallo welt schön dass du hier bist'
[19] 'hallo welt schön dass du hier bist'
[20] 'hallo welt schön dass du hier bist'
[21] 'hallo welt schön dass du hier bist'
[22] 'hallo welt schön dass du hier bist'
[23] 'hallo welt schön dass du hier bist'
[24] 'halo welt schön dass du hier bist httpsschonfeldde'
[25] 'hallo welt schön dass du hier bist hello world welcome to my blog glad youre here homehttpswwwmarcushessde about mehttpswwwmarcushessdeaboutme contact mehttpswwwmarcushessdecontactme hello world my name is markus hess and this is my blog i have been blogging for almost a decade now im the founder of the online marketing agency mhh gmbhhttpmhhgmbhde in munich germany we help companies to increase their sales by teaching them how to use internet marketing properly we offer various services such as search engine optimizationhttpswwwmarcushessdesearchengineoptimization pay per click ppc managementhttpswwwmarcushessdeppcmangement social media marketinghttpswwwmarcushessdesocialmediamarketing email marketinghttpswwwmarcushessdeemailmarketing i write articles on topics such as search engine optimizationhttpswwwmarcushessdesearchengineoptimization pay per click ppchttpswwwmarcushessdeppcmangement and social media marketinghttpswwwmarcushessdesocialmediamarketing i also regularly write reviews of interesting products you can follow me on twitterhttptwittercommhh_gmbh facebookhttpwwwfacebookcommhhgmbh gmailhttpgmailcom and my rss feed you can also subscribe to my blogvia email image 1 subscribe by email the worlds simplest way for people who love the web to stay in touchhttpsmarcushessdetheme2imagessubscribe_by_emailpnghttpsmarcushessdenewsletter'
[26] 'hallo welt schön dass du hier bist'
[27] Timed out
[28] 'hallo welt schön dass du hier bist your browser does not support html5 video here is a gif httpsimgyoutubecomvi4qk68bfzwygmaxresdefaultjpg'
[29] 'hallo welt schön dass du hier bist'
[30] 'hallo welt schön dass du hier bist'
[31] 'hallo welt schön dass du hier bist'
[32] 'hallo welt schön dass du hier bist httpswebcoredroidcom'
[33] Timed out
[34] 'hallo welt schön dass du hier bist'
[35] 'hallo welt schön dass du hier bist'
[36] 'hallo welt schön dass du hier bist'
[37] 'hallo welt schön dass du hier bist fusszeile herrchen und damen wir freuen uns auf ihre beförderung in berlin wir hoffen dass sie sich bei ihnen wohl fühlen werkstatt berlin gmbh infowerkstattberlindemailtoinfowerkstattberlinde'
[38] 'hallo welt schön dass du hier bist'
[39] 'hallo welt schön dass du hier bist'
[40] 'hallo welt schön dass du hier bist'
[41] 'hallo welt schön dass du hier bist'
[42] 'hallo welt schön dass du hier bist'
[43] 'hallo welt schön dass du hier bist'
[44] 'hallo welt schön dass du hier bist'
[45] 'hallo welt schön dass du hier bist'
[46] 'hallo welt schön dass du hier bist'
[47] 'hallo welt schön dass du hier bist'
[48] 'hallo welt schön dass du hier bist hallo welt schöne news ein neues album wurde veröffentlicht neues design neue produkte eingeführt die neue tafel die neue kasse und mehr neuer support der neue shop ist live erfahre alles über unseren neuen shop erfahre alles über unseren neusten shop erfahre alles über unser neues album neues album erschienen die neue band 1 und die neue band 2 neue version unserer webseite eingeführt neuer kontaktformular erfährt wenn du uns kontaktiersterfahre alles über unseren neuen shop neues design wurde veröffentlichterhalten du den newsletter blockquoteschönblockquotepschöne news ein neues album wurde veröffentlichtpbodyhtml'
[49] 'hallo welt schön dass du hier bist'
[50] 'hallo welt schön dass du hier bist'
[51] 'hallo welt schön dass du hier bist'
[52] Timed out
[53] 'hello world schön dass du hier bist'
[54] 'hallo welt schön dass du hier bist'
[55] 'hallo welt schön dass du hier bist httpswwwgeschichteonlinedeindexhtml'
[56] 'hallo welt schön dass du hier bist'
[57] 'hello welt schön dass du hier bist'
[58] Timed out
[59] 'halo welt schön dass du hier bist neue folgen image 1 instagramhttpsstaticxxfbcdnnetrsrcphpv3yme4pqr57zk0spng instagram image 2 tiktokhttpsimgplayergiphycomimgbm0abpqwjnl8vwyx8ogiphy100gif tiktok alle posts schon wieder eine woche vorbei und wir haben die wochenauflistung auf unserer facebook seitehttpswwwfacebookcomdoppelpixel wenn du eine neue seite gefunden hast würdest du hier einladen wollen und vielleicht auch mal mit den folgen zu teilen dann kannst du einfach über die unten stehende facebook link einlösen ich habe schon einige posts auf der seite gemacht und ich bin sehr begeistert die gefühle bei den menschen sind schön wenn sie sich treffen und alle gemeinsam zusammenhängen können hochzeitsgeschenke haben ich für euch geschickt so war es nicht leicht da die geburtstage von 3 unterschiedlichen häusern waren aber das ergebnis ist toll ein großes dankeschön an deinen vater von euch werden die kinder sehr verstanden werden das ist ja unsere familie bald geht das neue jahr 2018 das wird es ein sehr schönes und gemütliches jahr für alle sein und ich freue mich auf ein besseres und mehrfreies leben und auch gerne mehr zeit mit deinem jungen zu verbringen da bist du ja die beste frau in der welt wir haben einen neuen namen wwwdoppelpixeldehttpswwwfacebookcomdoppelpixel doppelpixel ist ein kleines werbefotografie studio und wir sind gerne für dich da wenn du dir deine fotos anstellst dann kannst du auch mal mitmachen aber natürlich musst du nur noch eine anmeldung bei uns machen hallo welt schön dass du hier bist'
[60] 'hallo welt schön dass du hier bist'
[61] 'hallo welt schön dass du hier bist __'
[62] 'hallo welt schön dass du hier bist'
[63] 'hallo welt schön dass du hier bist'
[64] 'hallo welt schön dass du hier bist hello world welcome to my site hi there hey there nice to meet you welcome to the blog of liza hey there nice to meet you hi there hey there nice to meet you liza humphreys human rights lawyer human rights lawyer biobio contactcontact image 1 imagehttpswwwlizahumphreyscomimgbio5jpghttpswwwlizahumphreyscomimgbio5jpg bio human rights lawyer i am a human rights lawyer with experience in international arbitration and in domestic public law i have worked on cases concerning discrimination state aid investment disputes tax issues corporate governance and employment i have represented clients in front of the european court of justice and the european court of human rights i act for individuals and groups as well as for governments and other public entities i am experienced in human rights law and international arbitration 10 years experience image 2 imagehttpswwwlizahumphreyscomimgbio3jpghttpswwwlizahumphreyscomimgbio3jpg contact liza humphreys liza humphreys i am a human rights lawyer with experience in international arbitration and domestic public law i have worked on cases concerning discrimination state aid investment disputestax issues corporate governance and employment i have represented clients in front of the european court of justice and the european court of human rights i act for individuals and groups as well as for governments and other public entities i am experienced in human rights law and international arbitration 10 years experience image 3 imagehttpswwwlizahumphreyscomimgbio2jpghttpswwwlizahumphreyscomimgbio2jpg bio human rights lawyer i am a human rights lawyer with experience in international arbitration and domestic public law i have worked on cases concerning discrimination state aid investment disputes tax issues corporate governance and employment i have represented clients in front of the european court of justice and the european court of human rights i act for individuals and groups as well as for governments and other public entities i am experienced in human rights law and international arbitration 10 years experience'
[65] 'hello world hallo welt schön dass du hier bist'
[66] 'hallo welt schön dass du hier bist'
[67] 'hallo welt schön dass du hier bist'
[68] 'hallo welt schön dass du hier bist'
[69] 'hallo welt schön dass du hier bist'
[70] 'hello welt schön dass du hier bist'
[71] 'hallo welt schön dass du hier bist'
[72] 'hallo welt schön dass du hier bist was ist dein interesse dienstleistungenhttpskellerkulthausdedienstleistungen kostenlose bewertung über unshttpskellerkulthausdeaboutus über uns galeriehttpskellerkulthausdegalerie kunst der ferne bloghttpskellerkulthausdeblog erlebnisse linkshttpskellerkulthausdelinks links fotoshttpskellerkulthausdefotos bilder unserer kultheide kontakthttpskellerkulthausdekontakt wie komm ich ans kellerkulthaus httpskellerkulthausde'
[73] 'hallo welt schön dass du hier bist hello world welcome to our website hello world welcome to our website'
[74] 'hallo welt schön dass du hier bist'
[75] 'hallo welt schön dass du hier bist'
[76] 'hallo welt schön dass du hier bist __'
[77] 'hallo welt schön dass du hier bist'
[78] 'hallo welt schön dass du hier bist'
[79] 'hallo welt schön dass du hier bist'
[80] 'halo welt schön dass du hier bist'
[81] 'hallo welt schön dass du hier bist'
[82] 'hallo welt schön dass du hier bist'
[83] Timed out
[84] 'hallo welt schön dass du hier bist herramienta de chatbot para empresas convierte tus visitas en oportunidades pegar el códigohttpswwwchatbotscom chatbots es una herramienta de conversación nativo en línea que ayuda a incrementar la conversión de tu web con chatbots no tendrás que preocuparte por ningún código ni configuraciones la solución está listas para usar solo tendrás que cargarla en el dominio web y comenzará a funcionar inmediatamente haz clic aquí para ver más escríbenos infochatbotscom image 1 chatbothttpswwwchatbotscomassetsimgchatbotpnghttpswwwchatbotscom convierte tus visitas en oportunidades síguenos en las redes sociales linkedinhttplinkedincomcompanychatbots facebookhttpsfacebookcomchatbotscom instagramhttpsinstagramcomchatbots pinteresthttppinterestcomchatbot chatbot 2015 image 2 chatbot logohttpswwwchatbotscomassetsimgchatbotlogopnghttpswwwchatbotscom image 3httpswwwchatbotscomassetsimgwelcometoshownoticejpg'
[85] 'hallo welt schön dass du hier bist'
[86] 'hallo welt schön dass du hier bist'
[87] 'hallo welt schön dass du hier bist herr jürgen ritter wwwjuergenritterdehttpsjulrt20180725helloworld hey jürgen hello im the developer behind wwwjuergenritterdehttpsjuergenritterde a website where i try to present interesting topics if you have a look around it could be worth and as an aside if youre interested in how i build the site i wrote about how ive done thathttpsjulrt20180725howibuiltwebsite back in july 2018 best regards jürgen'
[88] 'hallo welt schön dass du hier bist'
[89] 'hallo welt schön dass du hier bist httpswwwgithubcom'
[90] 'hallo welt schön dass du hier bist'
[91] 'hallo welt schön dass du hier bist'
[92] 'hallo welt schön dass du hier bist'
[93] 'halo welt schön dass du hier bist undefined'
[94] 'halo welt schön dass du hier bist __'
[95] 'hallo welt schön dass du hier bist hello world welcome to my page my name is felix i am 18 years old and live in münster i study business administration at university of applied sciences münster hfom in my spare time i like to play football and surf'
[96] 'hallo welt schön dass du hier bist'
[97] 'hallo welt schön dass du hier bist'
[98] 'hallo welt schön dass du hier bist image 1 kaffeehttpswwwfreshcomimghero2png kaffee desserts fresh coffee shop 3503 s main street fort smith ar 75601 contact ushttpfreshcomcontact visit our websitehttpfreshcom fresh coffee shop menuhttpswwwfreshcommenu cateringhttpswwwfreshcomcatering deliveryhttpswwwfreshcomdelivery contact ushttpswwwfreshcomcontact image 2 logohttpswwwfreshcomimglogopng visit our websitehttpfreshcom fresh coffee shop'
[99] 'halo welt schön dass du hier bist __'
Success count: 63

Server logs look pretty much normal to me:

Sep 19 11:17:45 shiro ollama[110168]: [GIN] 2024/09/19 - 11:17:45 | 200 |      54.603µs |       127.0.0.1 | HEAD     "/"
Sep 19 11:17:45 shiro ollama[110168]: [GIN] 2024/09/19 - 11:17:45 | 200 |   29.874085ms |       127.0.0.1 | POST     "/api/show"
Sep 19 11:17:45 shiro ollama[110168]: time=2024-09-19T11:17:45.606-07:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/var/lib/ollama/models/blobs/sha256-b730ecb205841fa79870c9587f41855dd5c64959e23875181581aaca9cbebf48 gpu=GPU-be5ce887-d47b-9e22-e352-ee15320e2a03 parallel=4 available=23505469440 required="1.9 GiB"
Sep 19 11:17:45 shiro ollama[110168]: time=2024-09-19T11:17:45.607-07:00 level=INFO source=server.go:101 msg="system memory" total="125.6 GiB" free="117.4 GiB" free_swap="0 B"
Sep 19 11:17:45 shiro ollama[110168]: time=2024-09-19T11:17:45.608-07:00 level=INFO source=memory.go:326 msg="offload to cuda" layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[21.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="927.4 MiB" memory.weights.repeating="744.8 MiB" memory.weights.nonrepeating="182.6 MiB" memory.graph.full="299.8 MiB" memory.graph.partial="482.3 MiB"
Sep 19 11:17:45 shiro ollama[110168]: time=2024-09-19T11:17:45.613-07:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1197864322/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/models/blobs/sha256-b730ecb205841fa79870c9587f41855dd5c64959e23875181581aaca9cbebf48 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 29 --flash-attn --parallel 4 --port 41863"
Sep 19 11:17:45 shiro ollama[110168]: time=2024-09-19T11:17:45.614-07:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Sep 19 11:17:45 shiro ollama[110168]: time=2024-09-19T11:17:45.614-07:00 level=INFO source=server.go:590 msg="waiting for llama runner to start responding"
Sep 19 11:17:45 shiro ollama[110168]: time=2024-09-19T11:17:45.614-07:00 level=INFO source=server.go:624 msg="waiting for server to become available" status="llm server error"
Sep 19 11:17:45 shiro ollama[110703]: INFO [main] build info | build=0 commit="unknown" tid="140083243057152" timestamp=1726769865
Sep 19 11:17:45 shiro ollama[110703]: INFO [main] system info | n_threads=16 n_threads_batch=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140083243057152" timestamp=1726769865 total_threads=32
Sep 19 11:17:45 shiro ollama[110703]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="31" port="41863" tid="140083243057152" timestamp=1726769865
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: loaded meta data with 29 key-value pairs and 338 tensors from /var/lib/ollama/models/blobs/sha256-b730ecb205841fa79870c9587f41855dd5c64959e23875181581aaca9cbebf48 (version GGUF V3 (latest))
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   1:                               general.type str              = model
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   2:                               general.name str              = Qwen2 1.5b Reader
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   3:                       general.organization str              = Jinaai
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   4:                           general.finetune str              = reader
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   5:                           general.basename str              = qwen2
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   6:                         general.size_label str              = 1.5B
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   7:                            general.license str              = cc-by-nc-4.0
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   8:                               general.tags arr[str,1]       = ["text-generation"]
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["multilingual"]
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  10:                          qwen2.block_count u32              = 28
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  11:                       qwen2.context_length u32              = 256000
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  12:                     qwen2.embedding_length u32              = 1536
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  13:                  qwen2.feed_forward_length u32              = 8960
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  14:                 qwen2.attention.head_count u32              = 12
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  15:              qwen2.attention.head_count_kv u32              = 2
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  16:                       qwen2.rope.freq_base f32              = 2000000.000000
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  17:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  18:                          general.file_type u32              = 2
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  19:                       tokenizer.ggml.model str              = gpt2
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  20:                         tokenizer.ggml.pre str              = qwen2
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  21:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  22:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  23:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  24:                tokenizer.ggml.eos_token_id u32              = 151645
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  25:            tokenizer.ggml.padding_token_id u32              = 151643
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 151643
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  27:                    tokenizer.chat_template str              = {% for message in messages %}{% if lo...
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - kv  28:               general.quantization_version u32              = 2
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - type  f32:  141 tensors
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - type q4_0:  196 tensors
Sep 19 11:17:45 shiro ollama[110168]: llama_model_loader: - type q6_K:    1 tensors
Sep 19 11:17:45 shiro ollama[110168]: time=2024-09-19T11:17:45.865-07:00 level=INFO source=server.go:624 msg="waiting for server to become available" status="llm server loading model"
Sep 19 11:17:45 shiro ollama[110168]: llm_load_vocab: special tokens cache size = 3
Sep 19 11:17:45 shiro ollama[110168]: llm_load_vocab: token to piece cache size = 0.9308 MB
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: format           = GGUF V3 (latest)
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: arch             = qwen2
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: vocab type       = BPE
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_vocab          = 151936
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_merges         = 151387
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: vocab_only       = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_ctx_train      = 256000
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_embd           = 1536
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_layer          = 28
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_head           = 12
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_head_kv        = 2
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_rot            = 128
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_swa            = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_embd_head_k    = 128
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_embd_head_v    = 128
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_gqa            = 6
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_embd_k_gqa     = 256
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_embd_v_gqa     = 256
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_ff             = 8960
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_expert         = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_expert_used    = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: causal attn      = 1
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: pooling type     = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: rope type        = 2
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: rope scaling     = linear
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: freq_base_train  = 2000000.0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: freq_scale_train = 1
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: n_ctx_orig_yarn  = 256000
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: rope_finetuned   = unknown
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: ssm_d_conv       = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: ssm_d_inner      = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: ssm_d_state      = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: ssm_dt_rank      = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: model type       = ?B
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: model ftype      = Q4_0
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: model params     = 1.54 B
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: model size       = 885.97 MiB (4.81 BPW)
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: general.name     = Qwen2 1.5b Reader
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: LF token         = 148848 'ÄĬ'
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
Sep 19 11:17:45 shiro ollama[110168]: llm_load_print_meta: max token length = 256
Sep 19 11:17:46 shiro ollama[110168]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Sep 19 11:17:46 shiro ollama[110168]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Sep 19 11:17:46 shiro ollama[110168]: ggml_cuda_init: found 1 CUDA devices:
Sep 19 11:17:46 shiro ollama[110168]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Sep 19 11:17:46 shiro ollama[110168]: llm_load_tensors: ggml ctx size =    0.30 MiB
Sep 19 11:17:47 shiro ollama[110168]: llm_load_tensors: offloading 28 repeating layers to GPU
Sep 19 11:17:47 shiro ollama[110168]: llm_load_tensors: offloading non-repeating layers to GPU
Sep 19 11:17:47 shiro ollama[110168]: llm_load_tensors: offloaded 29/29 layers to GPU
Sep 19 11:17:47 shiro ollama[110168]: llm_load_tensors:        CPU buffer size =   182.57 MiB
Sep 19 11:17:47 shiro ollama[110168]: llm_load_tensors:      CUDA0 buffer size =   885.97 MiB
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model: n_ctx      = 8192
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model: n_batch    = 512
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model: n_ubatch   = 512
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model: flash_attn = 1
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model: freq_base  = 2000000.0
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model: freq_scale = 1
Sep 19 11:17:47 shiro ollama[110168]: llama_kv_cache_init:      CUDA0 KV buffer size =   224.00 MiB
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model: KV self size  =  224.00 MiB, K (f16):  112.00 MiB, V (f16):  112.00 MiB
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model:  CUDA_Host  output buffer size =     2.34 MiB
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model:      CUDA0 compute buffer size =   299.75 MiB
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model:  CUDA_Host compute buffer size =    19.01 MiB
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model: graph nodes  = 875
Sep 19 11:17:47 shiro ollama[110168]: llama_new_context_with_model: graph splits = 2
Sep 19 11:17:47 shiro ollama[110703]: INFO [main] model loaded | tid="140083243057152" timestamp=1726769867
Sep 19 11:17:47 shiro ollama[110168]: time=2024-09-19T11:17:47.623-07:00 level=INFO source=server.go:629 msg="llama runner started in 2.01 seconds"

Then it's just this, repeated:

Sep 19 11:39:01 shiro ollama[110168]: [GIN] 2024/09/19 - 11:39:01 | 200 |      31.108µs |       127.0.0.1 | HEAD     "/"
Sep 19 11:39:01 shiro ollama[110168]: [GIN] 2024/09/19 - 11:39:01 | 200 |   38.223353ms |       127.0.0.1 | POST     "/api/show"
Sep 19 11:39:01 shiro ollama[110168]: [GIN] 2024/09/19 - 11:39:01 | 200 |  239.507075ms |       127.0.0.1 | POST     "/api/generate"

I guess it's probably a model issue, not ollama's?

rick-github commented 1 month ago

Setting temperature to zero gives the expected results.

$ for i in {1..100} ; do curl -s localhost:11434/api/generate -d '{"model":"reader-lm","prompt":"<html><body><h1>Hallo Welt</h1><p>Schön, dass du hier bist</p></body></html>","options":{"temperature":0},"stream":false}' ; done | jq .response | sort | uniq -c
    100 "Hallo Welt\n-----------\n\nSchön, dass du hier bist\n\n"
MeinDeutschkurs commented 1 month ago

So, it seems to be that on ollama.com the pull does not include temperature 0 for the model.

I'm new into ModelFile, maybe I can create a version with temperature 0 and num_ctx 80000 for myself.

Thx for all your thoughts✨

rick-github commented 1 month ago

~Note that a context of 80k will need about 25G VRAM so will run slowly if you don't have a large GPU.~ Sorry, 2.5G not 25G. So will fit in GPU.

MeinDeutschkurs commented 1 month ago

Isn't it quantized? Q4 or so? I just have 192GB available, but it seems to run "ok" for my use case.

whogben commented 1 month ago

(Copied from #6887)

Win 11 Nvidia 3070 Ollama 0.3.12 here

I can get reader-lm to respond when I dont set temperature, but as soon as I add the temperature parameter to the request my client freezes and ollama seeming never responds or never completes it's response, super strange. I've tested my setup with many other ollama models no problemo, so I don't think the issue is on the client side.

Weirdly, the 0.5b version works, no other changes besides changing the model parameter between :latest and :0,5b

Failing / never-ending payload:

{
    "messages": [
        {
            "content": "<meta property=\"og:description\" content=\"Introducing the FeatherS3 -&amp;nbsp;The pro ESP32-S3 Development Board in the Feather Format now with a u.FL connector&amp;nbsp;instead of an onboard antenna, for the times when you want to connect ...\">",
            "role": "user"
        }
    ],
    "model": "reader-lm:latest",
    "options": {
        "temperature": 0
    },
    "stream": false
}

Working payload #1 (only model change)

{
    "messages": [
        {
            "content": "<meta property=\"og:description\" content=\"Introducing the FeatherS3 -&amp;nbsp;The pro ESP32-S3 Development Board in the Feather Format now with a u.FL connector&amp;nbsp;instead of an onboard antenna, for the times when you want to connect ...\">",
            "role": "user"
        }
    ],
    "model": "reader-lm:0.5b",
    "options": {
        "temperature": 0
    },
    "stream": false
}

Working payload #2 (only remove temperature):

{
    "messages": [
        {
            "content": "<meta property=\"og:description\" content=\"Introducing the FeatherS3 -&amp;nbsp;The pro ESP32-S3 Development Board in the Feather Format now with a u.FL connector&amp;nbsp;instead of an onboard antenna, for the times when you want to connect ...\">",
            "role": "user"
        }
    ],
    "model": "reader-lm:latest",
    "options": {
    },
    "stream": false
}
rick-github commented 1 month ago

The model produces better results if the /api/generate endpoint is used rather than /api/chat.

The text looks double escaped - &amp;nbsp;. This seems to cause some sort of recursion, removing the double escaping improves the result.

It looks like the input is part of the training data for the model, so the completion is pulling text from there.

The nature of LLMs means that the output will sometimes be unpredictable, so the client should deal with unexpected results.

$ curl -s localhost:11434/api/generate -d '{"model":"reader-lm:latest","options":{"temperature":0},"prompt":"<meta property=\"og:description\" content=\"Introducing the FeatherS3 -&nbsp;The pro ESP32-S3 Development Board in the Feather Format now with a u.FL connector&nbsp;instead of an onboard antenna, for the times when you want to connect ...\">","stream":false}' | jq .response
"![Image 1: FeatherS3](https://www.thinglink.com/asset/09486752-0f8e-4d8c-bb6a-000000000000.png)\n\nIntroducing the FeatherS3 - The pro ESP32-S3 Development Board in the Feather Format now with a u.FL connector instead of an onboard antenna, for the times when you want to connect..."

WRT seemingly never completing, if you look in the server logs you will likely find lines that contain slot context shift. This means that the model is filling up the context window and the runner is sliding the window and discarding earlier tokens. It generally indicates a model has lost coherence and will not output an EOS token to indicate the completion is finished. ollama has heuristics to try and detect this, but if the context window is large or the token generation rate is slow, it can take some time for the limit to kick in. For example, going back to the version of the prompt that causes a loop we see that it starts to ping-pong between ESP32-S3R and ESP32-S3W:

$ curl -s localhost:11434/api/generate -d '{"model":"reader-lm:latest","options":{"temperature":0},"prompt":"<meta property=\"og:description\" content=\"Introducing the FeatherS3 -&amp;nbsp;The pro ESP32-S3 Development Board in the Feather Format now with a u.FL connector&amp;nbsp;instead of an onboard antenna, for the times when you want to connect ...\">","stream":false}' | jq .response
"![Image 1: FeatherS3](https://www.52feet.com/wp-content/uploads/2024/06/FeatherS3-1.png)\n\nIntroducing the FeatherS3 - The pro ESP32-S3 Development Board in the Feather Format now with a u.FL connector instead
 of an onboard antenna, for the times when you want to connect your ESP32-S3 directly to your computer. This board is compatible with all ESP32-S3 modules and has a USB-C port for power and data connection.\n\nT
he FeatherS3 features:\n\n*   100 MHz WiFi\n*   4G LTE\n*   5V GPIOs\n*   8MB Flash\n*   64 MB PSRAM\n\nIt is powered by the ESP32-S3 module, which includes a microcontroller and Wi-Fi, Bluetooth, and GPS module
s. The board also has an onboard antenna for wireless connectivity.\n\nThe FeatherS3 is compatible with all ESP32-S3 modules, including:\n\n*   [ESP32-S3](https://www.52feet.com/esp32-s3/)\n*   [ESP32-S3+](https
://www.52feet.com/esp32-s3-plus/)\n*   [ESP32-S3R](https://www.52feet.com/esp32-s3r/)\n*   [ESP32-S3W](https://www.52feet.com/esp32-s3w/)\n*   [ESP32-S3W+](https://www.52feet.com/esp32-s3w-plus/)\n*   [ESP32-S3R
+](https://www.52feet.com/esp32-s3r-plus/)\n*

This behaviour can be mitigated by telling ollama to limit the number of tokens generated with num_predict:

$ curl -s localhost:11434/api/generate -d '{"model":"reader-lm:latest","options":{"temperature":0,"num_predict":100},"prompt":"<meta property=\"og:description\" content=\"Introducing the FeatherS3 -&nbsp;The pro ESP32-S3 Development Board in the Feather Format now with a u.FL connector&nbsp;instead of an onboard antenna, for the times when you want to connect ...\">","stream":false}' | jq .response
"![Image 1: FeatherS3](https://www.52feet.com/wp-content/uploads/2024/06/FeatherS3-1.png)\n\nIntroducing the FeatherS3 - The pro ESP32-S3 Development Board in the Feather Format now with a u.FL connector instead of an onboard antenna, for the times when you want to connect your ESP32-S3 directly to your computer. This board is compatible with all the same"