stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
https://crfm.stanford.edu/helm
Apache License 2.0
1.94k stars 248 forks source link

Feature request: batching and running HELM as a library #2336

Open ruixin31 opened 9 months ago

ruixin31 commented 9 months ago

Hi, thank you very much for developing and maintaining the codebase. Here are some of my thoughts on using HELM:

Support for batching

Currently, utilizing all available GPU resources for smaller models is challenging due to the lack of batching support. It would enhance HELM's capabilities if it supports request batching instead of sending one request at a time.

Enhancing Support for Custom Models

Using this evaluation framework with custom models is quite difficult. Below, I'll outline what I've tried so far and offer suggestions on how to make it easier. Related issues: #1704, #1794, #1875.

How could one use HELM on a custom model

There are currently two main methods for loading a model:

  1. Setting up a server for your language model (LLM), as done in the NeurIPS Efficiency Challenge setup. However, this introduces overhead over the local port and requires implementing a server.

  2. Registering your model to HELM and run it from a command line. This can be accomplished in two ways:

    1. Writing a custom client. However, adding a new model to HELM requires a non-trivial amount of work to correctly implementing utilities.
    2. Leveraging Huggingface to minimize the need for utility coding. The issue here is that it is still non-trivial to set up correctly given that HELM runs as standalone software. This means that models must work with the Huggingface AutoModelForCausalLM.from_pretrained framework. So, how do we do that?
      1. One option is to upload the model to the Huggingface repository after writing a model.py so that the model registers correctly. However, this approach could hinder development speed.
      2. Alternatively, one could take advantage of the undocumented auto_map setting in the config, essentially faking it as a remote repository.

All the options above also require spawning a standalone program.

Idea on improving the current setup

It would be nice to allow running helm-run and model registration as a library call from HELM. Specifically, it could entail exposing HELMModel class with required fields for model registration and/or write a static method of HuggingfaceServer class for initialization from an existing huggingface model.

yifanmai commented 8 months ago

Regarding this idea:

Leveraging Huggingface to minimize the need for utility coding.

Does the existing OpenLMClient work for you? To use it, add prod_env/model_deployments.yaml relative to your current working directory, and add the following:

model_deployments:
  - name: openlm/your-model-name
    model_name: openlm/your-model-name
    tokenizer_name: openlm/your-tokenizer-name
    max_sequence_length: 2048
    client_spec:
      class_name: "helm.proxy.clients.open_lm_client.OpenLMClient"
      args:
        pretrained_model_name_or_path: /path/to/your/model

where /path/to/your/model is the path to your model. This should eliminate the need to set up a Hugging Face repository.

You also need to either replace openlm/your-tokenizer-name with one of the built-in HELM tokenizers, or you can add your own tokenizer by creating prod_env/tokenizer_configs.yaml:

tokenizer_configs:
  - name: openlm/your-tokenizer-name
    tokenizer_spec:
      class_name: "helm.proxy.tokenizers.huggingface_tokenizer.HuggingFaceTokenizer"
      args:
        pretrained_model_name_or_path: /path/to/your/tokenizer
    end_of_text_token: "<|endoftext|>"
    prefix_token: "<|endoftext|>"

Right now OpenLMClient always loads a model as an OpenLMforCausalLM. If you need other model classes, I could add the class name to the user-configurable args in model_deployments.yaml.

yifanmai commented 8 months ago

Regarding this idea:

It would be nice to allow running helm-run and model registration as a library call from HELM. Specifically, it could entail exposing HELMModel class with required fields for model registration and/or write a static method of HuggingfaceServer class for initialization from an existing huggingface model.

There is an existing way to run HELM in-process. Here is an example:

from helm.benchmark.config_registry import register_builtin_configs_from_helm_package
from helm.benchmark.model_deployment_registry import ClientSpec, ModelDeployment, register_model_deployment
from helm.benchmark.presentation.run_entry import RunEntry
from helm.benchmark.run import run_benchmarking, run_entries_to_run_specs
from helm.benchmark.tokenizer_config_registry import TokenizerConfig, TokenizerSpec, register_tokenizer_config
from helm.common.authentication import Authentication

register_builtin_configs_from_helm_package()

tokenizer_config = TokenizerConfig(
    name="openlm/your-tokenizer-name",
    tokenizer_spec=TokenizerSpec(
        class_name="helm.proxy.tokenizers.huggingface_tokenizer.HuggingFaceTokenizer",
        args={"pretrained_model_name_or_path": "/path/to/your/model"},
    ),
    end_of_text_token="<|endoftext|>",
    prefix_token="<|endoftext|>",
)

model_deployment = ModelDeployment(
    name="openlm/your-model-name",
    model_name="openlm/your-model-name",
    tokenizer_name="openlm/your-tokenizer-name",
    max_sequence_length=2048,
    client_spec=ClientSpec(
        class_name="helm.proxy.clients.huggingface_client.HuggingFaceClient",
        args={"pretrained_model_name_or_path": "/path/to/your/model"},
    ),
)

register_tokenizer_config(tokenizer_config)
register_model_deployment(model_deployment)

run_entries = [
    RunEntry(
        "mmlu:subject=anatomy,model=openlm/your-model-name", priority=1, groups=None
    )
]

run_specs = run_entries_to_run_specs(
    run_entries=run_entries,
    max_eval_instances=5,
    num_train_trials=1,
)

run_benchmarking(
    run_specs=run_specs,
    auth=Authentication({}),
    url=None,
    local_path="prod_env",
    num_threads=4,
    output_path="benchmark_output",
    suite="openlm",
    dry_run=False,
    skip_instances=False,
    cache_instances=False,
    cache_instances_only=False,
    skip_completed_runs=False,
    exit_on_error=False,
    runner_class_name=None,
    mongo_uri=None,
    disable_cache=False,
)

Note that the API is non-public and might break in the future. I recognize that this API is quite difficult to use. I have thought about exposing a better API, especially for IPython / Colab, but unfortunately it would require significant internal refactors.

yifanmai commented 8 months ago

Regarding this idea:

Writing a custom client. However, adding a new model to HELM requires a non-trivial amount of work to correctly implementing utilities.

I recognize this is difficult, but I wanted to discuss custom clients as an option. There is a way to run custom code in your clients without needing to modify HELM.

Suppose you have a client in custom_client.py on your working directory:

from typing import Dict

from helm.proxy.clients.client import CachingClient, truncate_and_tokenize_response_text
from helm.common.request import wrap_request_time, Request, RequestResult
from helm.proxy.tokenizers.tokenizer import Tokenizer

class CustomClient(CachingClient):
    def __init__(self, tokenizer: Tokenizer, tokenizer_name: str):
        self._tokenizer = tokenizer
        self._tokenizer_name = tokenizer_name

    def make_request(self, request: Request) -> RequestResult:
        def do_it() -> Dict:
            # Return the reversed prompt
            return {"completion": request.prompt[::-1]}

        response: Dict = wrap_request_time(do_it)()
        wrap_request_time(do_it)
        generated_output = truncate_and_tokenize_response_text(
            response["completion"], request, self._tokenizer, self._tokenizer_name
        )
        completions = [generated_output for _ in range(request.num_completions)]

        return RequestResult(
            success=True,
            cached=False,
            request_time=0,
            request_datetime=response.get("request_datetime"),
            completions=completions,
            embedding=[],
        )

You can then register a model with this client by setting class_name in ClientSpec to custom_client.CustomClient. Then you can run it using either the Python script method above, or the helm-run CLI command.

Note that if you use the helm-run method, you need to add the current working directory to PYTHONPATH for this to work e.g. PYTHONPATH="." helm-run -r mmlu:subject=anatomy,model=openlm/your-model-name -m 5 --suite openlm. You don't need to do this with the Python script method, because the python / python3 command automatically adds the current working directory to PYTHONPATH.

yifanmai commented 3 months ago

Related issue regarding batching: #1341