huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
795 stars 95 forks source link

Append revision to filepath in `--output_dir`? #56

Open lewtun opened 8 months ago

lewtun commented 8 months ago

Currently, lighteval stores results/details in a path that is determined by the model name, e.g.

scratch/evals
├── details
│   └── Qwen
│       └── Qwen1.5-0.5B-Chat
│           ├── 2024-02-26T15-36-31.681219
│           │   └── details_lighteval|truthfulqa:mc|0_2024-02-26T15-36-31.681219.parquet
│           └── results_2024-02-26T15-36-31.681219.json
└── results
    └── Qwen
        └── Qwen1.5-0.5B-Chat
            └── results_2024-02-26T15-36-31.681219.json

However, I am quite often evaluating models with different revisions and the current save logic groups these all together in the same subfolder which makes it hard to determine which result corresponds to which run.

Would it make sense to append the model revision parameter to the filepaths, e.g. something like this for the main revision (or whatever is passed to the revision arg in the script):

scratch/evals
├── details
│   └── Qwen
│       └── Qwen1.5-0.5B-Chat
│           └── main
│               ├── 2024-02-26T15-36-31.681219
│               │   └── details_lighteval|truthfulqa:mc|0_2024-02-26T15-36-31.681219.parquet
│               └── results_2024-02-26T15-36-31.681219.json
└── results
    └── Qwen
        └── Qwen1.5-0.5B-Chat
            └── main
                └── results_2024-02-26T15-36-31.681219.json

My current workaround is to manually specify the model path in --output_dir={ORG}/{MODEL_ID}/{REVISION} and then glob the files. This is fine, but a bit clunky because one ends up with a long nested path like {ORG}/{MODEL_ID}/{REVISION}/results/{ORG}/{MODEL_ID}

clefourrier commented 8 months ago

Would {ORG}/{MODEL_ID}_{REVISION} work for you? I think it would allow you to get the info you need, without creating too many nested levels for users for which it would be irrelevant. If that would, I can add it to our logging.

lewtun commented 8 months ago

Would {ORG}/{MODEL_ID}_{REVISION} work for you? I think it would allow you to get the info you need, without creating too many nested levels for users for which it would be irrelevant. If that would, I can add it to our logging.

That would be perfect! Note that using the revision (not necessarily the SHA) would be ideal so that one can distinguish e.g. branches or tags like vX.Y from each other. I realise this is quite a niche ask, so don't worry if it's too annoying to include

lewtun commented 8 months ago

Actually after thinking about this a bit more, a somewhat better approach IMO would be to let the user specify the filepath in --output_dir, after which we dump the details/results there. This way people like me can nest the results as desired and we don't need to hardcode more logic :)

clefourrier commented 8 months ago

We expect the results to follow a specific path when pushed to the hub, which is one of the reasons why we have this nested architecture. At which level would you want to have control over the path?