stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
https://crfm.stanford.edu/helm
Apache License 2.0
1.9k stars 244 forks source link

Inference_runtime does not show on Efficiency Metrics of Web Browser #1552

Closed yidinghabana closed 9 months ago

yidinghabana commented 1 year ago

After running several inference case, the output folder under stats.json which collected 3 infererence_runtime successfully. But it does not on the browser http://localhost:8000/ under Efficiency Metrics of Results Page. Can anyone advise? Thanks.

JosselinSomervilleRoberts commented 1 year ago

Could you paste the commands you ran? To get results to show up you need to run:

helm-run --suite <YOUR_SUITE> <THE_REST_OF_YOUR_ARGS>
helm-summarize --suite <YOUR_SUITE>
helm-server

Can you confirm you followed these steps? Thanks

yidinghabana commented 1 year ago

Thanks for your reply. Here is the command I run:

            helm-run \
                --run-specs boolq:model=databricks/dolly-v2-3b \
                --enable-huggingface-models databricks/dolly-v2-3b \
                --local \
                --suite v1 \
                --max-eval-instances 3

But stats.json did not show denoised inference runtime, so there is no information showed on Efficiency page as below:

image

Thanks for helping!

yidinghabana commented 1 year ago

The following steps are :

helm-summarize --suite v1 helm-server

yidinghabana commented 1 year ago

Can anyone shed some lights on this?

deepakn94 commented 1 year ago

This is because we don't have performance data for those models. These need to be populated and added to the JSON files here (for idealized runtimes, denoised runtimes has a similar file in the same directory): https://github.com/stanford-crfm/helm/blob/main/src/helm/benchmark/efficiency_data/inference_idealized_runtimes.json.

yidinghabana commented 1 year ago

Thanks for your reply deepakn94. I wonder how to run these efficiency data runtime.json to get denoised or idealist inference runtime? e.g. this file: https://github.com/stanford-crfm/helm/blob/main/src/helm/benchmark/efficiency_data/inference_denoised_runtimes.json

deepakn94 commented 1 year ago

The high-level procedure is described here: https://arxiv.org/pdf/2305.02440.pdf.

Code is here: https://github.com/stanford-crfm/helm-efficiency. You first need to run relevant profiling code like this: https://github.com/stanford-crfm/helm-efficiency/blob/main/scripts/gpt2.sh. This should dump a logfile into logs/; you can then use the code in https://github.com/stanford-crfm/helm-efficiency/blob/main/notebooks/fit_runtimes.ipynb to find the right parameters that are dumped into the relevant JSON file.

yifanmai commented 9 months ago

Hi @yidinghabana, I hope the previous suggestion helped. I am closing this issue due to inactivity; feel free to reopen if you have further questions.