huggingface / transformers

πŸ€— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.47k stars 26.89k forks source link

πŸ”₯[Community Event] Doc Tests Sprint - Configuration filesπŸ”₯ #19487

Closed ydshieh closed 1 year ago

ydshieh commented 2 years ago

This sprint is similar to #16292 - but for model configuration files, i.e. configuration_[model_name].py. For example, src/transformers/models/bert/configuration_bert.py

The expected changes

The changes we expect could be find #19485:

  1. Change the import order of the model and configuration classes
  2. Add (with random weights) in the comment before model initialization line
  3. Add configuration_[model_name].py to utils/documentation_tests.txt (respecting the order)

Please do step 3. only after Running the doctest and make sure all tests pass (see below) πŸ™

How to run doctests

Suppose you are working on src/transformers/models/bert/configuration_bert.py. The steps to run the test are:

  1. Stage your changes

    git add src/transformers/models/bert/configuration_bert.py
  2. Prepare the files to be tested

    python utils/prepare_for_doc_test.py src

    or if you prefer to be more specific

    python utils/prepare_for_doc_test.py src/transformers/models/bert/configuration_bert.py

    This will change some files (doc-testing needs to add additional lines that we don't include in the doc source files).

  3. Launch the test:
    python -m pytest --doctest-modules src/transformers/models/bert/configuration_bert.py -sv --doctest-continue-on-failure
  4. Cleanup git status
    git checkout -- .

    to clean up the changes in step 1.

Ready (or not)?

If all tests pass, you can commit, push and open a PR πŸ”₯ πŸš€ , otherwise iterate the above steps πŸ’― !

LysandreJik commented 2 years ago

@ydshieh that's because there was "Fixes https://github.com/huggingface/transformers/issues/19487" in the description of the PR :)

"Fixes", like "close" or "fix" will close the issue when the PR is merged.

ndrohith09 commented 2 years ago

I would take gpt_neo , gpt_neox_japanese and gpt_neox

daspartho commented 2 years ago

I'll take on

ndrohith09 commented 2 years ago

I would like to work on openai and opt

Revanth2002 commented 2 years ago

I will take mbart and mctct

Revanth2002 commented 2 years ago

I will work on layoutlm , layoutlmv2 , layoutlmv3

ayaka14732 commented 2 years ago

I will work on ELECTRA

ayaka14732 commented 2 years ago

I will work on PoolFormer

ayaka14732 commented 2 years ago

I will work on PLBART

ayaka14732 commented 2 years ago

I will work on Nezha

sha016 commented 2 years ago

I'll take maskformer

0xrushi commented 2 years ago

Hi, Can I have LayoutLMv2 and BERT

ydshieh commented 2 years ago

Hi, Can I have LayoutLMv2 and BERT

Hi @rushic24 They have been done. You can check this file and find other config files to work with πŸ€—

sha016 commented 2 years ago

I'll take fsmt next

Saad135 commented 2 years ago

While browsing the list of model configurations, I noticed that the DebertaConfig class does not have an example docstring section. Unsure if that is supposed to be like that, but just incase its not, I will add a PR to include the example docstring and maybe I can get some feedback from there.

kushal-gopal commented 2 years ago

I'll work on dpt

ydshieh commented 2 years ago

DebertaConfig

That would be very nice, @Saad135 ! Thank you

Saad135 commented 2 years ago

I will take DeBERTa-v2 next

Saad135 commented 2 years ago

I can take camembert next

Saad135 commented 2 years ago

I can take DPR next

Saad135 commented 1 year ago

I can take DeformableDetrConfig next

JuheonChu commented 1 year ago

Can I take timesformer next?

ydshieh commented 1 year ago

Can I take timesformer next?

Sure! For the context, we decide not to use the tiny random model checkpoints anymore. If there are some downstream models which lack the checkpoint, we just not to provide the expected values.

elabongaatuo commented 1 year ago

Hello, I would like to take on gptj, longformer, and hubert

elabongaatuo commented 1 year ago

@ydshieh , may I share a list of models that are yet to be worked on?

ydshieh commented 1 year ago

@elabongaatuo GPT-J is large, and our CI won't be able to run doctest with its checkpoints.

I think gptj, longformer, and hubert are all covered in

https://github.com/huggingface/transformers/blob/5f3ea66bc0c27ad2a8761fdf8489cf7d72257b93/utils/documentation_tests.txt

Feel free to check the modeling files that are not in the above file πŸ€— if you want to work on it ❀️ . Thank you!

elabongaatuo commented 1 year ago

@ydshieh , thank you. m2m_100,llama and mvp don't have modeling files. a go ahead to work on them?

ydshieh commented 1 year ago

llama has no publicly available checkpoints on the Hub - no need to work on it. For the other 2 files, you can run doctest against them. If they pass, you can simply add them to documentation_tests.txt. Otherwise, we can discuss how to deal with the errors :-).

PhalitJotwani commented 1 year ago

Hi @ydshieh , I am new to open source, so just wanted to confirm whether I can take Falcon or not? Config file of Falcon is not mentioned in the documentation_tests.txt file.

AVAniketh0905 commented 1 year ago

Hello @ydshieh , I am new to open sorce and want to take barthez. If my contributions are successful, I'm eager to extend my involvement to other models as well. Looking forward to a productive and enduring journey of contributions!

Edit: I couldn't find configuration files for barthez. Any help is appreciated!

hegdeadithyak commented 1 year ago

I'll take roformer #26530