stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
https://crfm.stanford.edu/helm
Apache License 2.0
1.8k stars 239 forks source link

UnicodeDecodeError while running benchmark #2794

Open carina-ding opened 3 weeks ago

carina-ding commented 3 weeks ago

Hi, Thank you for developing this amazing benchmark!

I created a virtual environment using anaconda and successfully installed crfm-helm (I followed the instructions from the official document https://crfm-helm.readthedocs.io/en/latest/installation/).

When I tried to run benchmarking with the example given in the document https://crfm-helm.readthedocs.io/en/latest/quick_start/, I followed exactly the instructions given. However, when I run "helm-run --conf-paths run_entries.conf --suite v1 --max-eval-instances 10", I got this error: "UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 5580: character maps to undefined". I attached some screenshots of traceback below.

Could anyone help me with this issue?

Thank you very much!

helm_ss1 helm_ss2 helm_ss3

yifanmai commented 3 weeks ago

Hi @carina-ding, it looks like you have a malformed model_metadata.yaml file. The YAML parser is complaining that it is invalid YAML. You can check this by running this:

import dacite
import yaml

from helm.benchmark.model_metadata_registry import ModelMetadataList

path = "path/to/model_metadata.yaml"

with open(path, "r") as f:
    raw = yaml.safe_load(f)

model_metadata_list = dacite.from_dict(ModelMetadataList, raw)
carina-ding commented 2 weeks ago

Hi @yifanmai , thank you very much for your reply!

Sorry for the slight delay, as I was away last week.

I tried running the codes that you provided, it seems that it's producing the same error while running the codes.

I have attached some screenshots below, please let me know if I did anything wrong.

Thank you!

helm_ss4 helm_ss5 helm_ss6

yifanmai commented 1 week ago

It looks like the problem is because you're running on Windows, which is not officially supported yet. We're planning to add support in the next couple of months. In the meantime, you could consider running it on Linux using WSL2.

carina-ding commented 1 week ago

Hi @yifanmai , thank you very much for your reply and suggestions! I will try to run it using WSL2. Thank you!