huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.12k stars 26.58k forks source link

The new abstractions in /master are counterproductive #4620

Closed shenkev closed 4 years ago

shenkev commented 4 years ago

I understand the desire to use abstraction to make the library easier to use out-of-the-box, especially for newer users who just want to call run_model.py with some hyparameters.

However, as someone who considers himself a novice-intermediate and frequently needs to look at or modify the source code to achieve what I want, it's been a huge pain adapting to the new changes.

I'll try to be as concrete as possible but here are some big painpoints:

(I will say, these painpoints may come from inexpeience with the library rather than something is just hard/impossible to achieve but either way, as a novice-intermediate, the end result is the same: I have a hard time navigating the library)

Like for example, it's very hard to tell what any of the XXX.from_pretrained classes are actually doing.

These are just the painpoints I can think of from the top of my head. It's overall been just unsmooth using the new example files like run_language_modeling.py and the rest of the API.

My main suggestion is to rethink the trade-off beween simplicity, abstraction versus flexibility and ability to hack/modify the code. Generally I think abstraction is good, but it seems overly excessive in certain places and I wonder if you can achieve the same goal while doing a bit less.

A tangential observation is the example scripts get way too cumbersome when you try to support all these things within the same file (e.g. run_language_modeling.py): apex, tpu, tensorflow, distributed. There are flags everywhere.

julien-c commented 4 years ago

Thanks for your feedback, it's interesting.

We've also heard good feedback on the refactoring you mention (because for instance opening an example script that was 700 lines of code long could be daunting) so it's good to hear a diversity of feedback.

I think the crux of it is that it's not realistic to maintain dozens of combinations – e.g. {apex, TPU, TF, distributed} x {all possible tasks} – of self-contained example scripts.

Unless you use code generation (which comes with its own set of problems), things will get out-of-sync and break really fast. CI would be really hard too.

To give a concrete example, adding TPU support to each individual script without refactoring would have been an overwhelming task. With the Trainer, we've been able to do it in a decently easy way and we'll have robust CI in place in the coming weeks.

I do agree that "hackability" of this library is very important and we're trying to find a good trade-off for this. I feel like we don't have many layers of abstraction. (models are mostly self-contained, etc.)

We're very open to any suggestion to improving things so let us know of any ideas you'd have to make this hackability easier.

patil-suraj commented 4 years ago

I have a different perspective here. Transformer is pretty hackable as it is now. It's very easy to take any model, add any task specific heads on it in all sorts of exotic ways. Also the recently introduced Trainer is pretty impressive, it removes lot of the boilerplate from previous examples and I didn't find that it limits hackability. The way Trainer code is structured, its very easy to modify it.

The models are also pretty hackable, here are two really great examples of this

  1. This notebook shows how you can replace the existing attention mechanism in BERT like models and replace them with LongformerSelfAttention to convert them to long versions, also trains using new Trainer
  2. This second notebook shows how you can train HF models with fastai.

Also I've recently started contributing, and navigating though the codebase and making changes was a breeze.

Also HF models can be trained with all sorts trainers. I've personally trained HF models with my own training loop, HF Trainer, pytorch-lightning, ignite, fastai and it plays nicely with all of these.

And I think the goal of the examples is to give standard templates for doing certain tasks but it doesn't limit or discourage from modifying them in any way.

So considering all this I would say that Transformers is pretty hackable and also provides nice light-weight abstractions wherever needed. I really appreciate this!

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.