ngxson / wllama

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
432 stars 21 forks source link

Firefox: Error in input stream #115

Open flatsiedatsie opened 1 month ago

flatsiedatsie commented 1 month ago

I noticed something off while testing on Firefox.

Screenshot 2024-09-16 at 21 24 48

The output was very odd too:

Screenshot 2024-09-16 at 20 45 22

I then updated Firefox to the latest version (130). The issue persisted.

I then deleted the model manually, and let it re-download it. That seems to have done.. something. But the model is still acting ...strangely.

Screenshot 2024-09-16 at 21 46 59

I then tried another model TinyLlama. I noticed some strange behaviour. First I use the preload functionality to only download the model to the cache. It apparently succesfully pre-downloads to 100%. Then I actually start the model, which should now already be fully cached. However, the caching seems incomplete:

Screenshot 2024-09-16 at 21 57 14
flatsiedatsie commented 1 month ago

Also seeing this, possibly related? n_batch is 1024.

Screenshot 2024-09-16 at 22 00 06
flatsiedatsie commented 1 month ago

The Wllama demo runs fine, so I guess it's in my implementation somewhere. https://github.ngxson.com/wllama/examples/main/dist/

flatsiedatsie commented 1 month ago

Perhaps related, seeing this on Safari:

Screenshot 2024-09-16 at 23 01 14
flatsiedatsie commented 1 month ago

After a refresh of the page and trying again, I see a different error:

Screenshot 2024-09-16 at 23 04 43

Unlike Firefox it does perform inference normally.