vturrisi / solo-learn

solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
MIT License
1.42k stars 185 forks source link

Hosting models on HF to improve discoverability #391

Open NielsRogge opened 3 weeks ago

NielsRogge commented 3 weeks ago

Hi @haoqiwang,

Niels here from the open-source team at Hugging Face. I discovered your work through a paper that builds on your framework: https://huggingface.co/papers/2403.14973. I work together with AK on improving the visibility of researchers' work and libraries on the hub.

I see that the model zoo of "solo-learn" uses Google Drive for its hosting.

It'd be great to make the models available on the 🤗 hub, we can add tags so that people find them when filtering https://huggingface.co/models.

For instance in this case, "image-feature-extraction" seems useful: https://huggingface.co/models?pipeline_tag=image-feature-extraction or "image-classification": https://huggingface.co/models?pipeline_tag=image-classification.

Integrating as a library

We could integrate "solo-learn" as a proper library on the hub as we've done with many others as shown here: https://huggingface.co/docs/hub/en/models-adding-libraries. This would ensure all checkpoints are properly tagged, there's a "How to use this model" button on each model repo which links to "solo-learn", and you get download stats (seeing how many times people actually download your models).

Uploading models

See here for a guide: https://huggingface.co/docs/hub/models-uploading. In case the models are custom PyTorch model, we could probably leverage the PyTorchModelHubMixin class which adds from_pretrained and push_to_hub to each model. Alternatively, one can leverages the hf_hub_download one-liner to download a checkpoint from the hub.

We encourage researchers to push each model checkpoint to a separate model repository, so that things like download stats also work.

Let me know if you need any help regarding this!

Cheers,

Niels ML Engineer @ HF 🤗