Closed sfriedowitz closed 7 months ago
How much of this will we be able to port to the platform if we need to?
I think it will help with the platform a lot! For instance, being able to specify hf://
vs file://
vs s3://
will likely be the difference between raw HF models and internal models that we have trained.
It's good groundwork for some of those goals IMO.
What's changing
This PR overhauls the path specification mechanics in the library. Instead of using config data classes, I implement a validation type called
AssetPath
that validates a string for having a prefix in the following:hf://
for huggingface reposfile://
for direct file pathswandb://
for fully qualified W&B pathsThis approach is analogous to how many other libraries specify paths, and I believe is a more direct way of doing so. It also lets me change the config field from
load_from
topath
, which makes much more semantic sense in my head.Follow-ups
You'll also notice that I changed the result types from the jobs a bit, to actually include the W&B artifacts directly. In a follow-up, I plan to centralize where the artifact logging is occurring within the
LMBuddy
class, which should simplify much of the conditional logic that is currently found within the job entrypoints.