Rocketknight1 commented 2 years ago

This issue is part of our Great Code Cleanup 2022. If you're interested in helping out, take a look at this thread, or come join us on Discord and talk with other contributors!

🚀 Add missing type hints

Type hints are used inconsistently in the transformers repo across both TF and PT models, and it'd be nice to make them a complete, consistent thing for the core models, especially because we want to develop features that depend on them!

Guide to contributing:

Ensure you've read our contributing guidelines 📜
Claim your architecture(s) in this thread (ensure no one is working on it). It's 100% okay to only take the TensorFlow or PyTorch version of a model, if you're not familiar with both frameworks! It's also okay to claim multiple models and group those changes into a single PR! 🎯
Implement the changes as in https://github.com/huggingface/transformers/pull/16057 or https://github.com/huggingface/transformers/pull/16074 (see the diff on the model architectures for a few examples) 💪
Open the PR and tag me in it. You should run make fixup at the end to do a code quality check before your final commit!

Tips for making your PR

The files you need to edit will be in src/transformers/models/[model_name]/
For TensorFlow, you want the modeling_tf_[model_name].py file. For PyTorch, you want the modeling_[model_name].py file.
Remember, you do not have to cover every class in that file!. The main thing we want to cover is the call (for TF) or forward (for PT) method for user-facing classes like TFRobertaForMaskedLM or RobertaForSequenceClassification. It's not necessary to add type hints to layers or base classes like RobertaModel or TFRobertaPreTrainedModel - these are trickier to write, and generally people do not use those classes as standalone models.
If you're unfamiliar with how type hints work, you can read the Python library documentation on them, but it's probably even easier to just look at another PR that added them. Take a look at the list of changes in the pull requests linked above!
The types will usually be obvious - most inputs are Optional[Union[np.ndarray, tf.Tensor]] for TF models and Optional[torch.Tensor] for PyTorch models, and boolean inputs are Optional[bool]. Pay attention to the first input of TF models, though, which is usually TFModelInputType - this is because Keras handles that first input in a special way! Other inputs to pay attention to are past_key_values, which can vary between models, and also the model output type. For the base model classes like RobertaModel, you may have to look at the corresponding MainLayer to figure out the right output type! Also, note that the output type may be a tuple if return_dict is False, in which case you should specify Union[Tuple, ...]. Finally, note that in TF models, training is never None, so it should be training: bool and not training: Optional[bool].
Note that some code is copied across our codebase. If you see a line like # Copied from transformers.models.bert..., this means that the code is copied from that source, and our scripts will automatically keep that in sync. If you see that, you should not edit the copied method! Instead, edit the original method it's copied from, and run make fixup to synchronize that across all the copies. Be sure you installed the development dependencies with pip install -e ".[dev"], as described in the contributor guidelines above, to ensure that the code quality tools in make fixup can run.

How can I find models that need type hints?

I used to maintain a list here, but it got out of date, I'm sorry. Instead, you can use this Colab notebook. If you run this, it will show you models in PyTorch or TF that are still missing type hints. Unlike my manually curated lists, it's guaranteed to be up to date - but do double-check that someone else in the thread hasn't claimed a model before you start, because the Colab code will only register type hints after the PR containing them is merged!

divyanshugit commented 2 years ago

I would love to work on PyTorch Albert🚀

johnnv1 commented 2 years ago

Hi, I would like to work on PyTorch ImageGPT

chainyo commented 2 years ago

Hi, I would like to work on CamemBERT for PT & TF.

I will take a look at LayoutLMv2 after the first one :smiley:

Edit: Because CamemBert depends on Roberta I will take PyTorch Roberta :+1:

Vaibhavs10 commented 2 years ago

Hello!

I'd like to take Hubert & Wav2Vec2 for Pytorch.

Cheers!

johnryan465 commented 2 years ago

I'll try PyTorch BERT to start!

Rocketknight1 commented 2 years ago

@johnryan465 I just did it as an example, I'm sorry! I'm marking off the completed models now.

johnryan465 commented 2 years ago

@Rocketknight1 no worries, will try and do DistillBert instead

cakiki commented 2 years ago

I'd like to work on GPT2 (TF).

chainyo commented 2 years ago

@Rocketknight1 I switch to Roberta PyTorch because CamemBERT depends on Roberta modeling

johnnygreco commented 2 years ago

Awesome! Hey @Rocketknight1 – I'd like to work on Longformer for both PyTorch & TF!

tanmoyio commented 2 years ago

I'd like to work on BigBird

jacobdineen commented 2 years ago

I would like to work on Clip for pytorch.

johnnv1 commented 2 years ago

Also, will work on BeiT, Deit and ViT (Pytorch)

bhavika commented 2 years ago

I can work on ImageGPT.

omer-dor commented 2 years ago

I can work on Swin (Pytorch)

elusenji commented 2 years ago

I'd like to work on XLM (Tensorflow)

Dahlbomii commented 2 years ago

I'll take T5 (Tensorflow)!

KristijanArmeni commented 2 years ago

I'd like to claim GPT-2 (PyTorch).

robotjellyzone commented 2 years ago

Hi @Rocketknight1,

I would like to work on BART of both TF and PyTorch

kamalkraj commented 2 years ago

ELECTRA TF - https://github.com/huggingface/transformers/pull/16104 ELECTRA PT - https://github.com/huggingface/transformers/pull/16103 DeBERTA PT - https://github.com/huggingface/transformers/pull/16105

manandey commented 2 years ago

XLMRobertaXL (PyTorch)

p-mishra1 commented 2 years ago

segformer pytorch

TristanBilot commented 2 years ago

I'll take OpenAIGPT!

robotjellyzone commented 2 years ago

Hi @Rocketknight1,

I would like to work on BART of both TF and PyTorch

can you please confirm with emoji whether i am eligible to take these or not? @Rocketknight1

jbrry commented 2 years ago

I will work on XLM (PyTorch)

Rocketknight1 commented 2 years ago

@robotjellyzone You can! Please note that we accepted a PR yesterday to add the TF decorator to BART, so make sure you're working on the most recent version of the library before you start your PR!

PepijnBoers commented 2 years ago

I'll take Distilbert (TensorFlow)

frgfm commented 2 years ago

Happy to take T5 (PyTorch)

@Rocketknight1 isn't the list missing ConvNext? If so, I'm happy to take care of that one too :ok_hand:

tmastrom commented 2 years ago

I'll work on GPTJ

robotjellyzone commented 2 years ago

@robotjellyzone You can! Please note that we accepted a PR yesterday to add the TF decorator to BART, so make sure you're working on the most recent version of the library before you start your PR!

OK sure! I will keep this in mind 😊👍...

jacobdineen commented 2 years ago

I'll take Splinter and ~~Segformer~~ Rembert for torch Edit: @p-mishra1 has Segformer. Taking Rembert instead

bhavika commented 2 years ago

Looks like ImageGPT was done. I can take Luke in PyTorch.