Open flatsiedatsie opened 1 month ago
Also seeing this, possibly related? n_batch
is 1024.
The Wllama demo runs fine, so I guess it's in my implementation somewhere. https://github.ngxson.com/wllama/examples/main/dist/
Perhaps related, seeing this on Safari:
After a refresh of the page and trying again, I see a different error:
Unlike Firefox it does perform inference normally.
I noticed something off while testing on Firefox.
The output was very odd too:
I then updated Firefox to the latest version (130). The issue persisted.
I then deleted the model manually, and let it re-download it. That seems to have done.. something. But the model is still acting ...strangely.
I then tried another model TinyLlama. I noticed some strange behaviour. First I use the preload functionality to only download the model to the cache. It apparently succesfully pre-downloads to 100%. Then I actually start the model, which should now already be fully cached. However, the caching seems incomplete: