Open sleepingcat4 opened 4 months ago
I was looking for example or documentation how I can load or quantise both a HF embedding model on Intel Gaudi2. is there any examples available? I don't want to use docker btw
@sleepingcat4 Please refer to: https://github.com/intel/neural-compressor/tree/bfa27e422dc4760f6a9b1783eee7dae10fe5324f/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/habana_fp8.
thank you! I will experiment with it tomorrow
I was looking for example or documentation how I can load or quantise both a HF embedding model on Intel Gaudi2. is there any examples available? I don't want to use docker btw