vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

Huggingface Integration #292

Closed vwxyzjn closed 1 year ago

vwxyzjn commented 1 year ago

Description

This PR closes #110. https://huggingface.co/cleanrl/CartPole-v1-dqn-seed1 is an example model page.

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Jan 4, 2023 at 8:20PM (UTC)
vwxyzjn commented 1 year ago

The integration also makes it easier to just run models, such as

https://github.com/vwxyzjn/cleanrl/blob/4074eee3b785fe06b90e48b5abfd404e896d41b8/cleanrl_utils/evals/dqn_eval.py#L43-L57

vwxyzjn commented 1 year ago

CC @ThomasSimonini for review :) Thanks!

vwxyzjn commented 1 year ago

@kinalmehta @simoninithomas and @Wauplin, thanks for the review. The CommitOperation suggestion is really helpful. Regarding some further comments:

Having a downstream part ( load_from_hub ), I can help with that. Appreciate the help! That said, we are only downloading a single file from the hub, so having a customized load_from_hub might be unnecessary, right?

https://github.com/vwxyzjn/cleanrl/blob/b4305405dbeabbbed6058b67b161b48325d094da/cleanrl_utils/evals/dqn_eval.py#L48

Generating a json/yaml file containing the hyperparameters for reproducibility?

Are you thinking of loading from the yaml file somehow to run the script like python dqn.py --load-yaml hyper.yaml?

Adding the library to our Hub list so that it creates a tag for people searching for cleanrl models.

That would be great! Thank you!

simoninithomas commented 1 year ago

Hi @vwxyzjn , yes for yaml I was thinking what you mentioned.

That said, we are only downloading a single file from the hub, so having a customized load_from_hub might be unnecessary, right?

Yes and no, because it has two advantages:

  1. We are able to count how many download of the model each month. image

  2. We can cache the model without using hf_hub_download directly.

For instance with SB3 integration here's the code for load_from_hub:

def load_from_hub(repo_id: str, filename: str) -> str:
    """
    Download a model from Hugging Face Hub.
    :param repo_id: id of the model repository from the Hugging Face Hub
    :param filename: name of the model zip file from the repository
    """
    try:
        from huggingface_hub import hf_hub_download
    except ImportError:
        raise ImportError(
            "You need to install huggingface_hub to use `load_from_hub`. "
            "See https://pypi.org/project/huggingface-hub/ for installation."
        )

    # Get the model from the Hub, download and cache the model on your local disk
    downloaded_model_file = hf_hub_download(
        repo_id=repo_id,
        filename=filename,
        library_name="huggingface-sb3",
        library_version="2.1",
    )

    return downloaded_model_file
simoninithomas commented 1 year ago

FIY From our side, we started to work on the frontend integration 🤗 https://github.com/huggingface/hub-docs/pull/447

vwxyzjn commented 1 year ago

Thank you @simoninithomas

We are able to count how many download of the model each month.

Does this mean hf_hub_download(repo_id="cleanrl/CartPole-v1-dqn-seed1", filename="q_network.pth") would not trigger the download stats?

We can cache the model without using hf_hub_download directly.

Does hf_hub_download not cache models? I ran the dqn_eval.py and noticed the download progress bar only presents once and it did not appear again during the second run, so I assumed hf_hub_download caches automatically.

FIY From our side, we started to work on the frontend integration 🤗 https://github.com/huggingface/hub-docs/pull/447

Awesome thanks! :)

Wauplin commented 1 year ago

Hi @vwxyzjn

Does this mean hf_hub_download(repo_id="cleanrl/CartPole-v1-dqn-seed1", filename="q_network.pth") would not trigger the download stats?

I'll let @simoninithomas answer on that as I am 100% sure what is counted in # downloads / month. Worth noticing that the example from @simoninithomas uses 2 kwargs library_name and library_version to make the Hub know which lib is downloading the model (e.g. a cleanrl user and not a random user).

Does hf_hub_download not cache models?

Yes it does ! No matter if you use hf_hub_download or snapshot_download , your files will be downloaded only once.

vwxyzjn commented 1 year ago

Hey, @simoninithomas I have also added an enjoy.py and added reproducibility information. Support for dqn_jax.py is also added (https://huggingface.co/vwxyzjn/CartPole-v1-dqn_jax-seed1). What do you think of the design? If all the design looks good, the next step is to run more experiments.

simoninithomas commented 1 year ago

Sorry for the delay @vwxyzjn the model card design is super good 😍.

What do you think are the next steps ? And how I can help?

Edit: The only thing is that from now, the Hub is not recognizing CleanRL lib (not generating a specific lib tag) so what I can do is add it to our UI first before your tests if you prefer.

In order to do that we need to add: metadata["library_name"] = "cleanrl" during the model card generation (like this in SB3: https://github.com/huggingface/huggingface_sb3/blob/main/huggingface_sb3/push_to_hub.py#L184)

I think from my side:

vwxyzjn commented 1 year ago

@simoninithomas, thanks for your patience.

Edit: The only thing is that from now, the Hub is not recognizing CleanRL lib (not generating a specific lib tag) so what I can do is add it to our UI first before your tests if you prefer.

Please give it a go :)

In order to do that we need to add: metadata["library_name"] = "cleanrl" during the model card generation (like this in SB3: https://github.com/huggingface/huggingface_sb3/blob/main/huggingface_sb3/push_to_hub.py#L184)

Done.

I think from my side:

All these sound good. I have just given you access to the repo in case you want to directly modify anything in this branch :) The current docs I have added are at https://github.com/vwxyzjn/cleanrl/blob/hf-integration/docs/get-started/zoo.md.

On my side, I will work on creating a PyPi release with this PR to enable the load models stuff colab tutorial. However, training with colab is still probably tricky because the training code is in the if __name__=="__main__" block.

Three todo-items on my side:

vwxyzjn commented 1 year ago

https://colab.research.google.com/drive/1vhbb4ak9smE7-LiHdaxwTACi1oOC5a4l?usp=sharing has a preliminary demo. To make the extra options appear during the pypi installation I had to do some hacks as suggested in https://github.com/python-poetry/poetry/issues/4842#issuecomment-1340462066

simoninithomas commented 1 year ago

Awesome thanks @vwxyzjn for your work I'm going to test that this afternoon 🤗 keep you updated.

simoninithomas commented 1 year ago

First of all, thank you very much for your work @vwxyzjn. I tried the collab. I have multiple questions:

  1. You call the architectures: exp_name I was wondering if it can be misleading 🤔 .
  2. The idea is to add some variants from now and in the future have all the algorithms working with the load model / save model?
  3. Thank you very much for the docs document it's good (I have some updates to do on this part but I'll ask you first). I need to verify with RL team and make some tests.

The model cards are very good. From my side I'm doing tomorrow some tests + doing UI library integration (that allows us to display CleanRL as an official library on the Hub instead of a simple tag).

To give you a timeframe, my idea is to be focused on this integration next week so don't hesitate to tell me how we can help you in addition to the tasks mentioned 🤗.

vwxyzjn commented 1 year ago

Thank you @simoninithomas and @Wauplin! Regarding your questions,

You call the architectures: exp_name I was wondering if it can be misleading 🤔 .

Yeah, this was a legacy issue. In retrospect, I think algo would have sounded better, but it seems too costly to change at this time.

The idea is to add some variants from now and in the future have all the algorithms working with the load model / save model?

Yes. Implementing model saving and loading is more challenging in some scripts such as ppo_continuous_aciton.py (see https://github.com/vwxyzjn/cleanrl/issues/310#issuecomment-1314603325).

Thank you very much for the docs document it's good (I have some updates to do on this part but I'll ask you first). I need to verify with RL team and make some tests.

Sounds good. Please keep me posted.

To give you a timeframe, my idea is to be focused on this integration next week so don't hesitate to tell me how we can help you in addition to the tasks mentioned 🤗.

Thank you! I will try to put more time on this PR for these two weeks. The current list of tasks looks good.

vwxyzjn commented 1 year ago

Got an example notebook working for all the supported models https://colab.research.google.com/drive/1vhbb4ak9smE7-LiHdaxwTACi1oOC5a4l?usp=sharing#scrollTo=TrQae62Y70H0.

image
simoninithomas commented 1 year ago

Amazing news thanks @vwxyzjn ! 🔥 .

So if we recap:

From my side:

  1. To track the number of downloads of a model (displayed in the right widget on model cards) we can keep track of a file. I was thinking of the pickle *pth
  2. I'm adding CleanRL to the official CleanRL in the tags with library_name: cleanrl. Given it's holiday season this might be merged only next week.
  3. I'll update Hub docs with the nice doc you've already done and make some cleanups.

🤗

vwxyzjn commented 1 year ago

Sounds good on my end. We should aim for a release after the new year. Happy holiday all :)

vwxyzjn commented 1 year ago

@simoninithomas @kinalmehta @Wauplin thanks so much for helping with this PR. I think everything looks good at this point. We also have a good notebook ready to go https://colab.research.google.com/github/vwxyzjn/cleanrl/blob/hf-integration/docs/get-started/CleanRL_Huggingface_Integration_Demo.ipynb. Documentation can be previewed at https://cleanrl-git-hf-integration-vwxyzjn.vercel.app/get-started/zoo/ (the embed link is broken in it because it's pointing to the master branch).

vwxyzjn commented 1 year ago

Merging this as is, subjecting to future PRs. We'd also probably use https://github.com/huggingface/blog/pull/616 to make the announcement. Thanks for the great work, folks!

Wauplin commented 1 year ago

Congrats ! That was a big piece of work 🎉🎉

simoninithomas commented 1 year ago

Congratulations 👏 I was off at the end of last week. I'm preparing the blogpost for next week and we're going to have a unit using CleanRL on PPO with Edward and me using GodotRL we will have the PR this week I'll mention you to put you in the loop.