Closed 2533245542 closed 1 year ago
Can you verify the weights are loaded correctly? I also would suggest, you use the inferer, so the prompts are in the right format. During training the model always sees the data like this:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
...
### Input: SOAP note is a type of clinical note. please expand on that
### Response:
I am not sure, if this is the optimal way to prompt the model, but that's how the Standford alpaca was designed.
Hi, Thank you so much for this open-source work! I'm wondering whether we should 'recover' the weights or load them directly into the model. The model only repeats my prompt or an empty string. Could you provide an example of a model demonstration?
Sure. This is a screenshot of how you can use the medalpaca Inferer
class.
Hey @kbressem, what if we want to pass context along with a question? Also I don't think the 8 bit Medalpacha model on Huggingface is loading properly
Hi, an update on this: I found the 7b version working fine. Both 7b and 13b are loaded from huggingface. I am using two A40 40G GPU with bitsandbytes==0.37.2.
For the prompt template part, since I used the same input for 7b and it worked so it might be a different problem.
Also I attached the code I used to test it here, could you try and see what you get?
from transformers import LlamaTokenizer
from transformers import AutoModelForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("medalpaca/medalpaca-13b")
model = AutoModelForCausalLM.from_pretrained("medalpaca/medalpaca-13b", device_map='auto')
input = 'who is the president of the united states'
input_ids = tokenizer(input, return_tensors="pt").input_ids.to('cuda')
print(tokenizer.decode(model.generate(input_ids, max_length=50)[0]))
@parthplc if you use the 8bit model, the AutoModelForCausalLM
will probably not work, as the decapoda-llama has an outdated config file. You need to explicitly use LlamaForCausalLM
. If this does not solve your problem, please provide more context what exactly fails.
If you want to pass additional context, you can either adapt the JSON (in case you want to pass the same context multiple times) or pass it to the inferer. Please refer to the docstring of the class. instruction
would be your context.
Args:
input (str):
The input text to provide to the model.
instruction (str, optional):
An optional instruction to guide the model's response.
output (str, optional):
Prepended to the models output, e.g. for 1-shot prompting
Hey @kbressem , I am still getting issue.
from transformers import LlamaTokenizer
from transformers import LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("medalpaca/medalpaca-lora-7b-8bit")
model = LlamaForCausalLM.from_pretrained("medalpaca/medalpaca-lora-7b-8bit", device_map='auto')
input = 'who is the president of the united states'
input_ids = tokenizer(input, return_tensors="pt").input_ids.to('cuda')
print(tokenizer.decode(model.generate(input_ids, max_length=50)[0]))
and error is
OSError: Can't load tokenizer for 'medalpaca/medalpaca-lora-7b-8bit'. If you were trying to load it from
'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make
sure 'medalpaca/medalpaca-lora-7b-8bit' is the correct path to a directory containing all relevant files for a
LlamaTokenizer tokenizer.
the 8-bit model is just the adapters, you still need to load the full model first, then the adapter. Please refer to the above screenshot using the inference class I provide, which does all this for you.
Hi,
I tried it from huggingface using
then the output is
</s>SOAP note is a type of clinical note. please expand on that OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
with the corresponding ids
tensor([ 2, 7791, 3301, 4443, 338, 263, 1134, 310, 24899, 936, 4443, 29889, 3113, 7985, 373, 393, 29871, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949, 29949], device='cuda:1')
Any clues?