fix: do not raise an exception if 8bit on cpu

We have one parameter to change the quantisation behaviour: load_in_4bit. This default to false. In that case we set load_in_8bit to True. However, none of the quantisation work on CPU, therefore we disable quantisation for CPU. This commit makes sure that we only raise an exception for the CPU 4bit combination and not the CPU 8bit combination. Otherwise the CI will always fail.

It is a bit of an ugly workaround, alternative is to have two variables for 8bit and 4bit. The disadvantage of this is that we only need this flexibility for CI. During training and inference we would always use one of the two quantisations.

jina-ai / jerboa

fix: do not raise an exception if 8bit on cpu #39