instructlab / taxonomy

Taxonomy tree that will allow you to create models tuned with your data
Apache License 2.0
153 stars 532 forks source link

instruct-lab-bot precheck issues #667

Open luke-inglis opened 3 months ago

luke-inglis commented 3 months ago
  1. it seems that it is currently using Llama formatting opposed to the Merlinite.
    
    sys_prompt = "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior."

prompt = f'<|system|>\n{sys_prompt}\n<|user|>\n{inputs}\n<|assistant|>\n' stop_token = '<|endoftext|>'


2. It also seems to be running multi turn at some points https://instruct-lab-bot.s3.us-east-2.amazonaws.com/precheck-pr-659-00158ec82c7eb42f338c29d67edce3cf21aac016/chat_2024-04-08T19_40_51.log
russellb commented 3 months ago

@luke-inglis Precheck uses ilab chat --endpoint-url ...URL-for-merlinite... to generate the responses.

If that's not doing the right thing, we need a bug against the CLI repo.

xukai92 commented 3 months ago

I think it's a bug in the endpoint. we are hitting it in the correct way from the CLI side.

xukai92 commented 3 months ago

looks like the template on the endpoint side is wrong and perhaps also the special tokens

bjhargrave commented 2 months ago

@luke-inglis Is this issue still a problem? Perhaps we can close this issue if not.