llSourcell / Doctor-Dignity

Doctor Dignity is an LLM that can pass the US Medical Licensing Exam. It works offline, it's cross-platform, & your health data stays private.
Apache License 2.0
3.83k stars 406 forks source link

1.8 Performance Boost via Soft Prompting - dataset['train'] - KeyError: "Column train not in the dataset. Current columns in the dataset: ['question', 'answer', 'options', 'meta_info', 'answer_idx', 'metamap_phrases']" #22

Open crarau opened 10 months ago

crarau commented 10 months ago

I'm getting

KeyError: "Column train not in the dataset. Current columns in the dataset: ['question', 'answer', 'options', 'meta_info', 'answer_idx', 'metamap_phrases']"

When running 1.8 Performance Boost via Soft Prompting on colab

def evaluate_model(model, tokenizer, dataset, conversation_history):
    correct = 0
    total = 0

    # Iterate through the dataset
    for example in dataset['train']: # Fixed from get_split to dictionary-style access
        try:
            question = example["question"]
            options = example["options"]
            correct_answer_idx = example["answer_idx"]

            # Combine the question with the options
            input_text = conversation_history + question + " " + " ".join([f"{k}: {v}" for k, v in options.items()]) + 'only respond with a single alphabetical character.'

            # Generate model's answer
            input_ids = tokenizer.encode(input_text, return_tensors="pt").to('cuda')
            output = model.generate(input_ids, num_beams=4)
            generated_text = tokenizer.decode(output[0]).strip()

            # Extract the selected option from the generated text
            predicted_answer_idx = generated_text[0]  # Assuming the generated text starts with the selected option letter

            # Compare with the correct answer
            if correct_answer_idx == predicted_answer_idx:
                correct += 1

            total += 1

        except KeyError as e:
            print("KeyError encountered for example:", example)
            raise e  # To see the full traceback and understand the origin of the error.

    return correct / total

Full message

User:
['question', 'answer', 'options', 'meta_info', 'answer_idx', 'metamap_phrases']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-28-bb8bf40a3938> in <cell line: 83>()
     81 
     82 # Evaluate the model
---> 83 accuracy = evaluate_model(model, tokenizer, dataset, conversation_history)
     84 print(f"Accuracy: {accuracy * 100:.2f}%")

4 frames
/usr/local/lib/python3.10/dist-packages/datasets/formatting/formatting.py in _check_valid_column_key(key, columns)
    518 def _check_valid_column_key(key: str, columns: List[str]) -> None:
    519     if key not in columns:
--> 520         raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}")
    521 
    522 

KeyError: "Column train not in the dataset. Current columns in the dataset: ['question', 'answer', 'options', 'meta_info', 'answer_idx', 'metamap_phrases']"
chunhualiao commented 10 months ago

I encountered the same problem. The solution is:

"for example in dataset['train']" change to "for example in dataset"

The input dataset is already the train split, no need to split it again.

jeff-mcdonald-vituity commented 10 months ago

Even with this change, this section really can't work as it's shared. The Prompting as it is won't yield an A,B,C,D answer to compare against the answers from the dataset. Even with heavy editing to coerce it into A,B,C,D answers, I'm not getting Llama to behave in this way even as good as completely random (25% accurate):

Accuracy: 10.53%

predicted vs. correct (E means no value returned) Last 30 chars
D D ",B,C,D). ASSISTANT: D\n\n\n\n"
D A "m (A,B,C,D). ASSISTANT: D"
E A "(A,B,C,D). ASSISTANT: \n\n\n\n"
E A "A,B,C,D). ASSISTANT: \n\n\n\n\n"
B D "m (A,B,C,D). ASSISTANT: B"
E C "(A,B,C,D). ASSISTANT: \n\n\n"
E D "MSMSMSMSMSMSMSMSMSMSMSMSMSMSMS"
D A "m (A,B,C,D). ASSISTANT: D"
E D "MSMSMSMSMSMSMSMSMSMSMSMSMSMSMS"
E D "(A,B,C,D). ASSISTANT: \n\n\n\n"
D C "m (A,B,C,D). ASSISTANT: D"
E C "(A,B,C,D). ASSISTANT: D."
B C "B,C,D). ASSISTANT: B\n\n\n\n\n"
D C "m (A,B,C,D). ASSISTANT: D"
E C "-MS-MS-MS-MS-MS-MS-MS-MS-MS-MS"
E B "(A,B,C,D). ASSISTANT: \n\n\n"
D D "m (A,B,C,D). ASSISTANT: D"
E C "MSMSMSMSMSMSMSMSMSMSMSMSMSMSMS"
C D "m (A,B,C,D). ASSISTANT: C"