Closed laibamehnaz closed 3 years ago
This looks like an issue with tokenizer in evaluation.
if you don't need generative metrics while Training then set predict_from_generate
to False
and don't pass compute_metrics
function to Trainer
pinging @patrickvonplaten
Hi @patil-suraj , Did exactly that. This issue didn't show up. But now all generations on the test set are the same regardless of the input.
Hey @laibamehnaz,
Could you copy paste a working code example, so that we can reproduce the error? :-) Note that you have to use a different tokenizer than the one that is used in bert2bert-cnn_dailymail-fp16
Sure, you can see the code here: https://colab.research.google.com/drive/1xxBQcPe05bFBQvJQLx6Mw2Qd9YVWPNS9?usp=sharing
Hmm, I don't get protobuf error when running locally...could you maybe adapt the google colab so that I can get your error above by just clicking "run" :-) ? (Setting the correct transformer pip installs and the correct trainer params, etc...)
Oh I am so sorry, this script won't give you the error because I have set predict_from_generate
as False
and prediction_loss_only
to True
.
Sure, I will share the script.
Hmm, I don't get protobuf error when running locally...could you maybe adapt the google colab so that I can get your error above by just clicking "run" :-) ? (Setting the correct transformer pip installs and the correct trainer params, etc...)
Here you go: https://colab.research.google.com/drive/1xxBQcPe05bFBQvJQLx6Mw2Qd9YVWPNS9?usp=sharing
Hmm, I don't get protobuf error when running locally...could you maybe adapt the google colab so that I can get your error above by just clicking "run" :-) ? (Setting the correct transformer pip installs and the correct trainer params, etc...)
I'm sorry the colab does not work for me...I get install errors when running the second cell. Let me merge the "more_general_trainer_metric" PR into master next week and then we can work directly on master.
Hi @patrickvonplaten , I have fixed the issue in the second cell. https://colab.research.google.com/drive/1xxBQcPe05bFBQvJQLx6Mw2Qd9YVWPNS9?usp=sharing
I'm sorry the colab does not work for me...I get install errors when running the second cell. Let me merge the "more_general_trainer_metric" PR into master next week and then we can work directly on master.
Sure, that will be great!!
@patrickvonplaten These types of metrics are included in #6769, and it's almost ready to merge.
How about we support EncoderDecoder
models in examples/seq2seq
?
@sshleifer does that make sense ?
Depends how much complexity it adds, but on a high level I like that idea a lot!
There is a more pressing issue of getting incremental decoding/use cache working for Roberta that I would probably prioritize higher.
@patil-suraj @sshleifer - I like the idea of adding EncoderDecoder
to your Seq2SeqTrainer
a lot. This way I won't have to continue my hacky PR here: https://github.com/huggingface/transformers/pull/5840. I would place this actually as more important since people need to be able to quickly fine-tune any EncoderDecoderModel
.
@patil-suraj - After merging your PR, I'd be happy to work on adding EncoderDecoder
to the Seq2Seq Trainer.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Evaluation: 100% 30/30 [00:45<00:00, 1.53s/it] [libprotobuf FATAL /sentencepiece/src/../third_party/protobuf-lite/google/protobuf/repeated_field.h:1505] CHECK failed: (index) >= (0): terminate called after throwing an instance of 'google::protobuf::FatalException' what(): CHECK failed: (index) >= (0):
I am using the following script: https://huggingface.co/patrickvonplaten/bert2bert-cnn_dailymail-fp16 Appreciate any help. Thank you.