[Bug] android opencl kernel matmul Too large unroll parameter causes out of memory

shifeiwen commented 1 month ago

🐛 Bug

To Reproduce

I found that when compiling the Android OpenCL kernel, the loop expansion may be too large, which may cause some codes to be too large. When going through clBuildProgram, it will cause the video memory to exceed. In Florence2, I followed the original setting of 64, which will report an error on 8gen2 phones, but not on 8gen3 phones. I changed it to 8 and it can run normally on 8gen2 phones. I understand that the opencl kernel of the previous mini_cpm model also has this problem when running on 8gen2 phones.

error pipline

https://github.com/mlc-ai/relax/blob/f5f048bbd71513f087799f987019e3931f68a6d9/python/tvm/dlight/gpu/matmul.py#L789

error msg 3rdparty/tvm/src/runtime/opencl/opencl_module.cc:264: OpenCL build error for device=0x7b1d4dcb58 Error: CL_OUT_OF_HOST_MEMORY

error cpp code err = clBuildProgram(programs_[func_name][device_id], 1, &dev, nullptr, nullptr, nullptr);

error_opencl_kernel.txt

This error can be alleviated by appropriately reducing the unroll parameter and the number of loop unrolling layers to control the size of the kernel code.

Expected behavior

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA):
Operating system (e.g. Ubuntu/Windows/MacOS/...):
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...)
How you installed MLC-LLM (conda, source):
How you installed TVM-Unity (pip, source):
Python version (e.g. 3.10):
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Any other relevant information:

Additional context

MasterJH5574 commented 1 month ago

Thanks for the valuable feedback! I wonder whether you find a smaller unroll number that can work on your side?

shifeiwen commented 1 month ago

Hi, @MasterJH5574 I followed the instructions in gemv and set the loop unrolling value to 8. Running the opencl kernel on 8gen2 does not cause errors. There is no significant difference in speed.

tqchen commented 1 month ago

Feel free to send a PR!

tqchen commented 3 weeks ago

gentle ping @shifeiwen

shifeiwen commented 2 weeks ago

@tqchen I will submit a PR later

mlc-ai / mlc-llm