Evaluate: microsoft/Phi-3

ggbetz commented 6 months ago

Check upon issue creation:

[x] The model has not been evaluated yet and doesn't show up on the CoT Leaderboard.
[x] There is no evaluation request issue for the model in the repo.
[x] The parameters below have been adapted and shall be used.

For XXX in:

[ ] 128k
[x] 4k

Parameters:

NEXT_MODEL_PATH=microsoft/Phi-3-mini-XXX-instruct
NEXT_MODEL_REVISION=main
NEXT_MODEL_PRECISION=bfloat16
MAX_LENGTH=2048 
GPU_MEMORY_UTILIZATION=0.8
VLLM_SWAP_SPACE=4

ToDos:

[ ] Run cot-eval pipeline
[ ] Merge pull requests for cot-eval results datats (> @ggbetz)
[ ] Create eval request record to update metadata on leaderboard (> @ggbetz)

yakazimir commented 5 months ago

Seems to be an issue with VLLM with this model:

2024-06-09T01:10:42.575996000Z 2024-06-09 01:10:42,575 - root - INFO - Formatted MC-Question-Block for lsat-lr dataset
2024-06-09T01:10:42.576031126Z 2024-06-09 01:10:42,575 - root - INFO - Loading vLLM model microsoft/Phi-3-mini-128k-instruct
2024-06-09T01:10:44.403415347Z Traceback (most recent call last):
2024-06-09T01:10:44.403433791Z   File "/usr/local/bin/cot-eval", line 8, in <module>
2024-06-09T01:10:44.403459111Z     sys.exit(main())
2024-06-09T01:10:44.403472573Z   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
2024-06-09T01:10:44.403507563Z     llm = VLLM(
2024-06-09T01:10:44.403513620Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
2024-06-09T01:10:44.403545765Z     super().__init__(**kwargs)
2024-06-09T01:10:44.403551285Z   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 341, in __init__
2024-06-09T01:10:44.403604111Z     raise validation_error
2024-06-09T01:10:44.403631526Z pydantic.v1.error_wrappers.ValidationError: 1 validation error for VLLM
2024-06-09T01:10:44.403633889Z __root__
2024-06-09T01:10:44.403635718Z    (type=assertion_error)

ggbetz commented 4 months ago

I've updated the container and evaluated microsoft/Phi-3-mini-4k-instruct.

ggbetz commented 3 months ago

I've also evaluated microsoft/Phi-3-small-8k-instruct with the updated docker container.

logikon-ai / cot-eval

Evaluate: microsoft/Phi-3 #51