Closed shenkev closed 4 years ago
Thanks for your feedback, it's interesting.
We've also heard good feedback on the refactoring you mention (because for instance opening an example script that was 700 lines of code long could be daunting) so it's good to hear a diversity of feedback.
I think the crux of it is that it's not realistic to maintain dozens of combinations – e.g. {apex, TPU, TF, distributed} x {all possible tasks}
– of self-contained example scripts.
Unless you use code generation (which comes with its own set of problems), things will get out-of-sync and break really fast. CI would be really hard too.
To give a concrete example, adding TPU support to each individual script without refactoring would have been an overwhelming task. With the Trainer, we've been able to do it in a decently easy way and we'll have robust CI in place in the coming weeks.
I do agree that "hackability" of this library is very important and we're trying to find a good trade-off for this. I feel like we don't have many layers of abstraction. (models are mostly self-contained, etc.)
We're very open to any suggestion to improving things so let us know of any ideas you'd have to make this hackability easier.
I have a different perspective here. Transformer is pretty hackable as it is now. It's very easy to take any model, add any task specific heads on it in all sorts of exotic ways. Also the recently introduced Trainer
is pretty impressive, it removes lot of the boilerplate from previous examples and I didn't find that it limits hackability. The way Trainer code is structured, its very easy to modify it.
The models are also pretty hackable, here are two really great examples of this
LongformerSelfAttention
to convert them to long versions, also trains using new Trainer
Also I've recently started contributing, and navigating though the codebase and making changes was a breeze.
Also HF models can be trained with all sorts trainers. I've personally trained HF models with my own training loop, HF Trainer, pytorch-lightning, ignite, fastai and it plays nicely with all of these.
And I think the goal of the examples is to give standard templates for doing certain tasks but it doesn't limit or discourage from modifying them in any way.
So considering all this I would say that Transformers is pretty hackable and also provides nice light-weight abstractions wherever needed. I really appreciate this!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I understand the desire to use abstraction to make the library easier to use out-of-the-box, especially for newer users who just want to call run_model.py with some hyparameters.
However, as someone who considers himself a novice-intermediate and frequently needs to look at or modify the source code to achieve what I want, it's been a huge pain adapting to the new changes.
I'll try to be as concrete as possible but here are some big painpoints:
(I will say, these painpoints may come from inexpeience with the library rather than something is just hard/impossible to achieve but either way, as a novice-intermediate, the end result is the same: I have a hard time navigating the library)
arguments are hidden, so far example in run_language_modeling.py I'm not able to see what all the available parameters are. I feel using abstraction over argument types like training_args, model_args, etc. is overkill and just programming fluff
model source code is hidden under many layers of abstraction
support for basic functionality is not implemented when code is refactored. One example is the new run_language_modeling.py doesn't have support for training continuation from a checkpoint
Like for example, it's very hard to tell what any of the XXX.from_pretrained classes are actually doing.
These are just the painpoints I can think of from the top of my head. It's overall been just unsmooth using the new example files like run_language_modeling.py and the rest of the API.
My main suggestion is to rethink the trade-off beween simplicity, abstraction versus flexibility and ability to hack/modify the code. Generally I think abstraction is good, but it seems overly excessive in certain places and I wonder if you can achieve the same goal while doing a bit less.
A tangential observation is the example scripts get way too cumbersome when you try to support all these things within the same file (e.g. run_language_modeling.py): apex, tpu, tensorflow, distributed. There are flags everywhere.