logikon-ai / cot-eval

A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
https://huggingface.co/spaces/logikon/open_cot_leaderboard
MIT License
5 stars 1 forks source link

Evaluate: microsoft/Phi-3 #51

Open ggbetz opened 2 months ago

ggbetz commented 2 months ago

Check upon issue creation:

For XXX in:

Parameters:

NEXT_MODEL_PATH=microsoft/Phi-3-mini-XXX-instruct
NEXT_MODEL_REVISION=main
NEXT_MODEL_PRECISION=bfloat16
MAX_LENGTH=2048 
GPU_MEMORY_UTILIZATION=0.8
VLLM_SWAP_SPACE=4

ToDos:

yakazimir commented 1 month ago

Seems to be an issue with VLLM with this model:

2024-06-09T01:10:42.575996000Z 2024-06-09 01:10:42,575 - root - INFO - Formatted MC-Question-Block for lsat-lr dataset
2024-06-09T01:10:42.576031126Z 2024-06-09 01:10:42,575 - root - INFO - Loading vLLM model microsoft/Phi-3-mini-128k-instruct
2024-06-09T01:10:44.403415347Z Traceback (most recent call last):
2024-06-09T01:10:44.403433791Z   File "/usr/local/bin/cot-eval", line 8, in <module>
2024-06-09T01:10:44.403459111Z     sys.exit(main())
2024-06-09T01:10:44.403472573Z   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
2024-06-09T01:10:44.403507563Z     llm = VLLM(
2024-06-09T01:10:44.403513620Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
2024-06-09T01:10:44.403545765Z     super().__init__(**kwargs)
2024-06-09T01:10:44.403551285Z   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 341, in __init__
2024-06-09T01:10:44.403604111Z     raise validation_error
2024-06-09T01:10:44.403631526Z pydantic.v1.error_wrappers.ValidationError: 1 validation error for VLLM
2024-06-09T01:10:44.403633889Z __root__
2024-06-09T01:10:44.403635718Z    (type=assertion_error)
ggbetz commented 1 week ago

I've updated the container and evaluated microsoft/Phi-3-mini-4k-instruct.