ray-project / ray-llm

RayLLM - LLMs on Ray
https://aviary.anyscale.com
Apache License 2.0
1.22k stars 87 forks source link

Weight caching being based on model-id creates confusion #53

Open ArturNiederfahrenhorst opened 11 months ago

ArturNiederfahrenhorst commented 11 months ago

Since Aviary caches weights based on the model id, changing the S3 path for a given model with a given model-id that has been run before does not do anything.

So in order for Aviary to respect changes in the S3 path of some model config, you have to go to the cache and delete the checkpoint. Every time you forget this, you will get a functioning LLM but with the wrong weights.

This behaviour is silent, so it's very hard to realize what the issue is. From the perspective of someone who does not know the internals of Aviary, this can create serious issues. For example, you can spend half a day evaluating models with the outcome that they are all of approximately the same quality. And even after that, you might not realize that you have made a mistake.