pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.14k stars 835 forks source link

Model Zoo Revamp #1470

Closed msaroufim closed 2 years ago

msaroufim commented 2 years ago

Problem

As of now our model zoo doesn't make it clear which models are available docs/model_zoo.md is not well maintained example doesn't include BERT or MMF. It doesn't show how users can submit a new model to the zoo (Making a PR is not enough since the S3 bucket is not publicly available and we don't show users instructions on how to upload things, we assume they know how to use the aws api)

This creates problems since

  1. Submitted examples can't have good test cases, we can't check in mar files
  2. Limits code reuse between teams in open source, can't share mar files with each other and it's not clear what's been done before or not

Solution

A better solution needs to make sure the zoo is public, searchable, automatically updated, allow user submissions and needs to worry about preventing user spam like spamming unwanted or harmful objects to an S3 bucket we maintain

Should we use something like pytorch model hub? hf hub? use a homegrown basic s3 hub?

Current Experience

The current experience is the torchserve team maintains an S3 bucket where only they have write access to common models users care about

Pros

  1. Curated models that work

Cons

  1. Doesn't allow community contributions which prevents rich set of examples, higher quality unit tests and growth overall

Pytorch hub

Pros

  1. PyTorch brand, curated

Cons

  1. May require some work to support a mar file format
  2. Cannot host weights without code review, does not allow arbitrary files to be stored

HuggingFace Hub

Pros

  1. Can upload arbitrary files including mar files from either a web UI or CLI
  2. Model Hub discovery is good
  3. No code review process,

Cons

  1. anyone can submit (not sure how they deal with spam and harmful content)

Homegrown Hub

Create our own model hub, or maybe standardize mar format more and revamp torch hub?

Pros

  1. Most flexible, can support any data format we like

Cons

  1. Need to host a service so community members can submit and inspect available models
  2. Need to deal with security, spam and harmful content since if users can submit anything it's a security risk to just unzip a random file from the internet
soumith commented 2 years ago

If hosting weights is a bottleneck, you should point users to services / ways to host weights for free. Github Releases, HuggingFace Hub are good options.

About curation of models, PyTorch Hub or HuggingFace Hub makes sense -- Homegrown Hub wont be better than either of those two in my opinion.

msaroufim commented 2 years ago

That's good feedback,

I'm thinking I'll just update the model_zoo docs with a tutorial on how to use pre-trained weights from the huggingface hub or pytorch and use a default base handler that does nothing except run model.forward. Maybe make this the getting started guide too.

And then I'd deprecate the existing model zoo since we aren't properly maintaining it and don't have a clear path to doing so either, there's no reason to have a hub just for mar files. I'd go as far as removing the word mar file from everywhere in our docs except internals guide and not make users think about packaging at all once we have #1460 in

We can keep the existing model zoo for our tests and make models available only to IPs coming in from CI machines and not the public internet - cc @lxning

lxning commented 2 years ago

@msaroufim current model zoo page is maintained manually. It is easily to cause model zoo page out-of-sync with link when new mar files are added into S3 bucket.

In my opinion, ideally model zoo page is generated automatically. The process can be:

  1. Upload mar file to S3:

    • add metadata such as model_name, model_type, dataset, size, sample_input, model_mode
  2. Host model zoo page separately (similar as nightly build page) for public readonly access.

  3. Write a script to update the above model zoo page This script go through all mar/war files, fetch mar/war file metadata, and then create the model zoo table.

  4. nightly update the model zoo page automatically.

msaroufim commented 2 years ago

@lxning I would actually argue for us to kill the model zoo completely at this point. If the main utility is tests then a model zoo actually does more harm than good since people can't update examples/ without knowing for sure if the code will break

So a better solution is in every unit test create a mar file from scratch and test it instead of having a model zoo where we're not sure who created the model, how and whether it still works

osanseviero commented 2 years ago

Hey all, Omar from HF Here :hugs:

We'd love to support your use case on the Hugging Face Hub if it makes sense! Just for clarification, the Hub is not constrained to :hugs: transformers models (or models created with Trainer). The Hub uses git-based repositories that anyone can create and upload models to, we actually have integrations with different libraries, many of which are not transformers nor NLP-focused.

One thing that you might find useful is that model cards have metadata that allow reporting things such as the dataset, metrics, tags, etc. This can help with discoverability and even comparison of evaluation results.

There is also the community Inference API that enables widgets to try out the models directly in the browser (or through HTTP requests), or Spaces for fancier demos such as the ones at https://huggingface.co/pytorch.

Let us know if we can help :smile: :llama:

cc @LysandreJik @julien-c

msaroufim commented 2 years ago

Hi @osanseviero I think this makes sense, I think at least for the hosting and model card part your hub is a good experience. I'm embarrassed to admit I couldn't find instructions to upload directories or files and populate a simple model card to the hub directly so if you can link me one I can whip out a POC very quickly

For everything else let's talk more. My email is my first name and last name at fb.com

osanseviero commented 2 years ago

Hey there! This doc has instructions of uploading your files to a repo: https://huggingface.co/docs/hub/adding-a-model, both using the web UI or through the terminal. Happy to help if you have any questions!

msaroufim commented 2 years ago

Quick update here - it was relatively easy to add a new model and tags and documentation. Probably quickest to just manually update everything for now https://huggingface.co/torchserve

So now users can actually see what's inside an archived file before downloading it and navigate to a root index where they can see which models are available where every model would have some card with some extra instructions and metadata

Notes to self

Somewhere in the torchserve docs I also need to make it clearer how I expect people to upload their own examples so something like

torch-model-archiver ... upload *.mar file` by using instructions from https://huggingface.co/docs/hub/adding-a-model

This way for community members they have an easy way of hosting examples and adding unit tests to our repo and collaborating with each other

Instructions

Instructions for one repo at a time

git lfs install 
git lfs track "*.mar" (which is just a zip file)
torch-model-archiver ... 
git add .
git commit -m "push"
git push

Make sure to push both the mar file and the unzipped files so people can easily navigate them

If pushing multiple models

pip install huggingface_hub
from huggingface_hub import HfApi
api = HfApi()
api.create_repo(
  repo_id = "model_name", # The name of our repository, by default under your user 
  private = False, # Whether the repo should be public or private
  repo_type = "model" # The type of repository, such as "model", "space", "dataset"
)

from huggingface_hub import Repository
repo = Repository(
  local_dir = 'model_store', 
  clone_from='my_username/model_name'
)

repo.push_to_hub(
  commit_message = "Our first big model!"
)

For developers on the core team my_username would just be torchserve

osanseviero commented 2 years ago

This is great @msaroufim! As a note, I think it makes sense to have an organization instead of a user for this. Then each individual from the core team can have access + we could add a nice organization card such as this one

julien-c commented 2 years ago

very cool @msaroufim! Yes, we can migrate this user to an org and add you to it if you'd like.