Closed bil-ash closed 3 weeks ago
I have renamed as suggested. By the way, what is prefill chunk size and how does it relate to memory usage and performance?
Thanks! Say prefill chunk size is 2k, if a prompt is 4k, it will be prefilled twice instead of all at once. This helps reduce the size of the intermediate buffer for the matrix multiplication.
Add support for quantized(q4f16) qwen2-0.5b . Wasm library taken from https://huggingface.co/julientfai/Qwen2-0.5B-Instruct-q4f16_1-Opilot/resolve/main/Qwen2-0.5B-Instruct-q4f16_1-webgpu.wasm?download=true