medalpaca 13b outputs "OOO,O,O,O,O,O,O,O,O,O,O,"

2533245542 commented 1 year ago

Hi,

I tried it from huggingface using

from transformers import LlamaTokenizer, AutoModelForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("medalpaca/medalpaca-13b")
model = AutoModelForCausalLM.from_pretrained("medalpaca/medalpaca-13b", device_map='auto')
input = 'SOAP note is a type of clinical note. please expand on that '
input_ids = tokenizer(input, return_tensors="pt").input_ids.to('cuda')
print(tokenizer.decode(model.generate(input_ids, max_length=50)[0]))

then the output is </s>SOAP note is a type of clinical note. please expand on that OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

with the corresponding ids tensor([ 2, 7791, 3301, 4443, 338, 263, 1134, 310, 24899, 936, 4443, 29889, 3113, 7985, 373, 393, 29871, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949], device='cuda:1')

Any clues?

kbressem commented 1 year ago

Can you verify the weights are loaded correctly? I also would suggest, you use the inferer, so the prompts are in the right format. During training the model always sees the data like this:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction: 
...
### Input: SOAP note is a type of clinical note. please expand on that 

### Response:

I am not sure, if this is the optimal way to prompt the model, but that's how the Standford alpaca was designed.

chingheng113 commented 1 year ago

Hi, Thank you so much for this open-source work! I'm wondering whether we should 'recover' the weights or load them directly into the model. The model only repeats my prompt or an empty string. Could you provide an example of a model demonstration?

kbressem commented 1 year ago

Sure. This is a screenshot of how you can use the medalpaca Inferer class.

parthplc commented 1 year ago

Hey @kbressem, what if we want to pass context along with a question? Also I don't think the 8 bit Medalpacha model on Huggingface is loading properly

2533245542 commented 1 year ago

Hi, an update on this: I found the 7b version working fine. Both 7b and 13b are loaded from huggingface. I am using two A40 40G GPU with bitsandbytes==0.37.2.

For the prompt template part, since I used the same input for 7b and it worked so it might be a different problem.

Also I attached the code I used to test it here, could you try and see what you get?

from transformers import LlamaTokenizer
from transformers import AutoModelForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("medalpaca/medalpaca-13b")
model = AutoModelForCausalLM.from_pretrained("medalpaca/medalpaca-13b", device_map='auto')
input = 'who is the president of the united states'
input_ids = tokenizer(input, return_tensors="pt").input_ids.to('cuda')
print(tokenizer.decode(model.generate(input_ids, max_length=50)[0]))

kbressem commented 1 year ago

@parthplc if you use the 8bit model, the AutoModelForCausalLM will probably not work, as the decapoda-llama has an outdated config file. You need to explicitly use LlamaForCausalLM. If this does not solve your problem, please provide more context what exactly fails.

If you want to pass additional context, you can either adapt the JSON (in case you want to pass the same context multiple times) or pass it to the inferer. Please refer to the docstring of the class. instruction would be your context.

Args:
    input (str):
        The input text to provide to the model.
    instruction (str, optional):
        An optional instruction to guide the model's response.
    output (str, optional): 
        Prepended to the models output, e.g. for 1-shot prompting

parthplc commented 1 year ago

Hey @kbressem , I am still getting issue.

from transformers import LlamaTokenizer
from transformers import LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("medalpaca/medalpaca-lora-7b-8bit")
model = LlamaForCausalLM.from_pretrained("medalpaca/medalpaca-lora-7b-8bit", device_map='auto')
input = 'who is the president of the united states'
input_ids = tokenizer(input, return_tensors="pt").input_ids.to('cuda')
print(tokenizer.decode(model.generate(input_ids, max_length=50)[0]))

and error is

OSError: Can't load tokenizer for 'medalpaca/medalpaca-lora-7b-8bit'. If you were trying to load it from 
'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make 
sure 'medalpaca/medalpaca-lora-7b-8bit' is the correct path to a directory containing all relevant files for a 
LlamaTokenizer tokenizer.

kbressem commented 1 year ago

the 8-bit model is just the adapters, you still need to load the full model first, then the adapter. Please refer to the above screenshot using the inference class I provide, which does all this for you.

kbressem / medAlpaca

medalpaca 13b outputs "OOO,O,O,O,O,O,O,O,O,O,O," #17