use new q4_0 batch kernel

Description

new batch kernel supports more device, requires state_size % 128 == 0 instead of state_size % 256 == 0, remove output_size % 32 == 0, but it supports max batch to 48 instead of 64

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

[ ] N/A
[ ] Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
[ ] Application test
[ ] Document test
[ ] ...

intel-analytics / ipex-llm

use new q4_0 batch kernel #12396

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?