Open ruixin31 opened 9 months ago
Regarding this idea:
Leveraging Huggingface to minimize the need for utility coding.
Does the existing OpenLMClient
work for you? To use it, add prod_env/model_deployments.yaml
relative to your current working directory, and add the following:
model_deployments:
- name: openlm/your-model-name
model_name: openlm/your-model-name
tokenizer_name: openlm/your-tokenizer-name
max_sequence_length: 2048
client_spec:
class_name: "helm.proxy.clients.open_lm_client.OpenLMClient"
args:
pretrained_model_name_or_path: /path/to/your/model
where /path/to/your/model
is the path to your model. This should eliminate the need to set up a Hugging Face repository.
You also need to either replace openlm/your-tokenizer-name
with one of the built-in HELM tokenizers, or you can add your own tokenizer by creating prod_env/tokenizer_configs.yaml
:
tokenizer_configs:
- name: openlm/your-tokenizer-name
tokenizer_spec:
class_name: "helm.proxy.tokenizers.huggingface_tokenizer.HuggingFaceTokenizer"
args:
pretrained_model_name_or_path: /path/to/your/tokenizer
end_of_text_token: "<|endoftext|>"
prefix_token: "<|endoftext|>"
Right now OpenLMClient
always loads a model as an OpenLMforCausalLM
. If you need other model classes, I could add the class name to the user-configurable args
in model_deployments.yaml
.
Regarding this idea:
It would be nice to allow running helm-run and model registration as a library call from HELM. Specifically, it could entail exposing HELMModel class with required fields for model registration and/or write a static method of HuggingfaceServer class for initialization from an existing huggingface model.
There is an existing way to run HELM in-process. Here is an example:
from helm.benchmark.config_registry import register_builtin_configs_from_helm_package
from helm.benchmark.model_deployment_registry import ClientSpec, ModelDeployment, register_model_deployment
from helm.benchmark.presentation.run_entry import RunEntry
from helm.benchmark.run import run_benchmarking, run_entries_to_run_specs
from helm.benchmark.tokenizer_config_registry import TokenizerConfig, TokenizerSpec, register_tokenizer_config
from helm.common.authentication import Authentication
register_builtin_configs_from_helm_package()
tokenizer_config = TokenizerConfig(
name="openlm/your-tokenizer-name",
tokenizer_spec=TokenizerSpec(
class_name="helm.proxy.tokenizers.huggingface_tokenizer.HuggingFaceTokenizer",
args={"pretrained_model_name_or_path": "/path/to/your/model"},
),
end_of_text_token="<|endoftext|>",
prefix_token="<|endoftext|>",
)
model_deployment = ModelDeployment(
name="openlm/your-model-name",
model_name="openlm/your-model-name",
tokenizer_name="openlm/your-tokenizer-name",
max_sequence_length=2048,
client_spec=ClientSpec(
class_name="helm.proxy.clients.huggingface_client.HuggingFaceClient",
args={"pretrained_model_name_or_path": "/path/to/your/model"},
),
)
register_tokenizer_config(tokenizer_config)
register_model_deployment(model_deployment)
run_entries = [
RunEntry(
"mmlu:subject=anatomy,model=openlm/your-model-name", priority=1, groups=None
)
]
run_specs = run_entries_to_run_specs(
run_entries=run_entries,
max_eval_instances=5,
num_train_trials=1,
)
run_benchmarking(
run_specs=run_specs,
auth=Authentication({}),
url=None,
local_path="prod_env",
num_threads=4,
output_path="benchmark_output",
suite="openlm",
dry_run=False,
skip_instances=False,
cache_instances=False,
cache_instances_only=False,
skip_completed_runs=False,
exit_on_error=False,
runner_class_name=None,
mongo_uri=None,
disable_cache=False,
)
Note that the API is non-public and might break in the future. I recognize that this API is quite difficult to use. I have thought about exposing a better API, especially for IPython / Colab, but unfortunately it would require significant internal refactors.
Regarding this idea:
Writing a custom client. However, adding a new model to HELM requires a non-trivial amount of work to correctly implementing utilities.
I recognize this is difficult, but I wanted to discuss custom clients as an option. There is a way to run custom code in your clients without needing to modify HELM.
Suppose you have a client in custom_client.py
on your working directory:
from typing import Dict
from helm.proxy.clients.client import CachingClient, truncate_and_tokenize_response_text
from helm.common.request import wrap_request_time, Request, RequestResult
from helm.proxy.tokenizers.tokenizer import Tokenizer
class CustomClient(CachingClient):
def __init__(self, tokenizer: Tokenizer, tokenizer_name: str):
self._tokenizer = tokenizer
self._tokenizer_name = tokenizer_name
def make_request(self, request: Request) -> RequestResult:
def do_it() -> Dict:
# Return the reversed prompt
return {"completion": request.prompt[::-1]}
response: Dict = wrap_request_time(do_it)()
wrap_request_time(do_it)
generated_output = truncate_and_tokenize_response_text(
response["completion"], request, self._tokenizer, self._tokenizer_name
)
completions = [generated_output for _ in range(request.num_completions)]
return RequestResult(
success=True,
cached=False,
request_time=0,
request_datetime=response.get("request_datetime"),
completions=completions,
embedding=[],
)
You can then register a model with this client by setting class_name
in ClientSpec
to custom_client.CustomClient
. Then you can run it using either the Python script method above, or the helm-run
CLI command.
Note that if you use the helm-run
method, you need to add the current working directory to PYTHONPATH
for this to work e.g. PYTHONPATH="." helm-run -r mmlu:subject=anatomy,model=openlm/your-model-name -m 5 --suite openlm
. You don't need to do this with the Python script method, because the python
/ python3
command automatically adds the current working directory to PYTHONPATH
.
Related issue regarding batching: #1341
Hi, thank you very much for developing and maintaining the codebase. Here are some of my thoughts on using HELM:
Support for batching
Currently, utilizing all available GPU resources for smaller models is challenging due to the lack of batching support. It would enhance HELM's capabilities if it supports request batching instead of sending one request at a time.
Enhancing Support for Custom Models
Using this evaluation framework with custom models is quite difficult. Below, I'll outline what I've tried so far and offer suggestions on how to make it easier. Related issues: #1704, #1794, #1875.
How could one use HELM on a custom model
There are currently two main methods for loading a model:
Setting up a server for your language model (LLM), as done in the NeurIPS Efficiency Challenge setup. However, this introduces overhead over the local port and requires implementing a server.
Registering your model to HELM and run it from a command line. This can be accomplished in two ways:
auto_map
setting in the config, essentially faking it as a remote repository.All the options above also require spawning a standalone program.
Idea on improving the current setup
It would be nice to allow running
helm-run
and model registration as a library call from HELM. Specifically, it could entail exposingHELMModel
class with required fields for model registration and/or write a static method of HuggingfaceServer class for initialization from an existing huggingface model.