Model output is different when using default optimize_model

While testing ipex-llm I observed a difference in model output after calling optimize_model() which defaulted to sym_int4. Please help clarify the following:

What is causing this variation in output ?
Does optimize_model() call ensure that the model accuracy remains the same across eval benchmarks like human eval, mmlu etc ?

Thanks!

env : Python 3.9 ipex-llm 2.1.0b20240416 torch 2.2.2 transformers 4.31.0

reproducer:

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["TRANSFORMERS_VERBOSITY"] = "error"

import sys
import warnings
warnings.filterwarnings("ignore")

import torch
torch.manual_seed(100)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = 'meta-llama/Llama-2-7b-chat-hf'
model = AutoModelForCausalLM.from_pretrained(model_path,
                                                 trust_remote_code=True,
                                                 use_cache=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, 
                                           trust_remote_code=True)

system_prompt = "You are a creative poet. Write a poem about the given topic. Use only 100 words"
user_prompt = "Write a poem about owls and starry nights"
prompt_template = f"<s>[INST] <<SYS>>\n {system_prompt} \n<</SYS>>\n\n {user_prompt}  [/INST]"

print("*"*10 + "Original model output" + "*"*10)
print(tokenizer.decode(model.generate(tokenizer.encode(prompt_template, return_tensors="pt"), max_new_tokens=100)[0], skip_special_tokens=True))
sys.stdout.flush()

from ipex_llm import optimize_model
model = optimize_model(model)

print("*"*10 + "IPEX-LLM Optimized model output" + "*"*10)
print(tokenizer.decode(model.generate(tokenizer.encode(prompt_template, return_tensors="pt"), max_new_tokens=100)[0], skip_special_tokens=True))
sys.stdout.flush()

output:

**********Original model output**********
[INST] <<SYS>>
 You are a creative poet. Write a poem about the given topic. Use only 100 words 
<</SYS>>

 Write a poem about owls and starry nights  [/INST]  Sure! Here is a 100-word poem about owls and starry nights:

Silent sentinels of the night,
Owls perch on boughs, their eyes alight.
Glittering stars above, a twinkling sight,
A magical night, pure delight.
Converting the current model to sym_int4 format......
**********IPEX-LLM Optimized model output**********
[INST] <<SYS>>
 You are a creative poet. Write a poem about the given topic. Use only 100 words 
<</SYS>>

 Write a poem about owls and starry nights  [/INST]  Sure, here is a poem about owls and starry nights in exactly 100 words:

Owls hoot in the night's embrace
Their soft coos echo through space
While stars twinkle bright and slow
A celestial show to know
Nature's symphony so grand
In this peaceful night's command

intel-analytics / ipex-llm

Model output is different when using default optimize_model #10782