Open vishaal27 opened 9 months ago
Hi @vishaal27, Unfortunately some checkpoints are not online as they are on vasa and have not been migrated to the gcloud bucket yet. I'm not sure if/when they'll come online, as the path to migrating them is not so straightforward as I have lost my berkeley access now :)
Hey @rtaori, thanks for your blazingly fast response! Is there anyone else with access who would be able to check this?
Potentially, let me check. But if you don't hear back within the next week, then probably there's no way to get these checkpoints :(
Sure thanks for checking, really appreciate this :)
Hey, I was running model evaluations on my own custom data-split for all models in the registry using:
where
<model>
comes from all the models in the registry (python db.py --list-models-registry
). However, for many of the models, I see a pickling error due to the checkpoint not being loaded correctly. See stack-trace below:I see that this error happens for all of the low-resource models like
resnet18_100k_x_epochs
,resnet18_50k_x_epochs
etc. To fully ensure this is not an artefact of my own custom data-split, I also tested this on the imagenet-val split with no success. Are the low-resource models not available as checkpoints from the server?Also, another set of errors I get when running this is due to some checkpoints still being stored on the vasa endpoint, see:
Are some of the checkpoints not migrated fully yet?
Sorry for the long verbose issue, but hope we can get this resolved :)