kbressem / medAlpaca

LLM finetuned for medical question answering
GNU General Public License v3.0
491 stars 57 forks source link

RecursionError: maximum recursion depth exceeded while calling a Python object #32

Closed lingluodlut closed 1 year ago

lingluodlut commented 1 year ago

Hi, I am trying to run the hugging face example of medalpaca-7b:

from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="medalpaca/medalpaca-7b", tokenizer="medalpaca/medalpaca-7b")
question = "What are the symptoms of diabetes?"
context = "Diabetes is a metabolic disease that causes high blood sugar. The symptoms include increased thirst, frequent urination, and unexplained weight loss."
answer = qa_pipeline({"question": question, "context": context})
print(answer)

but I got the following errors:

File "/home/Users/luol/anaconda3/envs/medalpaca/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) File "/home/Users/luol/anaconda3/envs/medalpaca/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc return self.unk_token_id File "/home/Users/luol/anaconda3/envs/medalpaca/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) File "/home/Users/luol/anaconda3/envs/medalpaca/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1035, in unk_token return str(self._unk_token) RecursionError: maximum recursion depth exceeded while calling a Python object

My transformers version is 4.30.0 Is the error caused by the transformers version? Thanks!

kbressem commented 1 year ago

Unfortunately, the pipeline wrapper does not work. I've written a simple inferer, which should work. But it has its limitations, as you can see in #31

qvks77 commented 1 year ago

@kbressem Can you give a working example on how to call the inferer? The one in the comments in the file doesn't even work.

hyesunyun commented 1 year ago

@kbressem I would also like to see a working example of the inferer. Thanks!

hyesunyun commented 1 year ago

@qvks77 I think I may have found the issue. It is the tokenizer that is the issue. https://github.com/huggingface/transformers/issues/22762

Although it doesn't seem like the right thing to do, when I use a working tokenizer such as huggyllama/llama-7b, I don't get this error anymore.

kbressem commented 1 year ago

Sorry for the late responses. I think the rapid changes in how LLaMA is implemented in HF and changes in the libraries used could be an issue here.

It is very likely that the tokenizer config I used (or is used in decapoda-research/llama) is outdated.

https://huggingface.co/abhipn implemented a solution to avoid this recursion error. We also plan to update the models soon, but are currently figuring out funding, so unfortunately the Repo is on hold for now.