Open yaoyanglee opened 1 year ago
Same issue here.
After I tried to remove the keyword, it also generates the error like the following:
NameError: name 'TensorDataset' is not defined
I think this is something missing in the import part.
After I fixed all things mentioned above, it began to work.
And I looked into the package(1.0.1.1) installed on my local server, I found the codes for this version did not sync with the main branch of the repo. It seems the latest main branch has fixed this issue. So I think we can fix it by reinstall the package from the repo rather than pip.
For the TensorDataSet NameError, I found that adding this line solves the issue
from torch.utils.data import TensorDataset
I would recommend upgrading numpy as well.
`!pip install sentencepiece from codetf.models import load_model_pipeline from codetf.data_utility.human_eval_dataset import HumanEvalDataset from codetf.performance.model_evaluator import ModelEvaluator import os
os.environ["HF_ALLOW_CODE_EVAL"] = "1" os.environ["TOKENIZERS_PARALLELISM"] = "true"
model_class = load_model_pipeline(model_name="causallm", task="pretrained", model_type="codegen-350M-mono", is_eval=True, load_in_8bit=True, weight_sharding=False)
dataset = HumanEvalDataset(tokenizer=model_class.get_tokenizer()) prompt_token_ids, prompt_attention_masks, references = dataset.load()
problems = TensorDataset(prompt_token_ids, prompt_attention_masks)
evaluator = ModelEvaluator(model_class) avg_pass_at_k = evaluator.evaluate_pass_k(problems=problems, unit_tests=references) print("Pass@k: ", avg_pass_at_k)`
Above is the code that was used. During execution in Google Colab, I received the error, in <cell line: 15>:15 │ │ │ │ /usr/local/lib/python3.10/dist-packages/codetf/data_utility/human_eval_dataset.py:29 in load │ │ │ │ 26 │ │ │ unit_test = re.sub(r'METADATA = {[^}]*}', '', unit_test, flags=re.MULTILINE) │ │ 27 │ │ │ references.append(unit_test) │ │ 28 │ │ │ │ ❱ 29 │ │ prompt_token_ids, prompt_attention_masks = self.process_data(prompts, use_max_le │ │ 30 │ │ │ │ 31 │ │ return prompt_token_ids, prompt_attention_masks, references │ │ 32 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: BaseDataset.process_data() got an unexpected keyword argument 'use_max_length'
After looking through the source code I don't seem to see this keyword argument, apart from max_length. Would anyone mind shedding some light on the issue?