Closed changyeli closed 2 years ago
Hi @changyeli, thanks for reporting.
According to the TypeError message you get, self.pad_token_id
is None and code line
File "/home/lixx3013/anaconda3/envs/toolkit/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2312, in _get_padding_truncation_strategies
if padding_strategy != PaddingStrategy.DO_NOT_PAD and (not self.pad_token or self.pad_token_id < 0):
is trying to compare it with 0 int value.
I think this is an issue with transformers
instead of datasets
. I'm transferring this issue to the transformers team.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
Hello everyone, following-up of this post and official blog post on fine-tuning Wav2Vec model. Turns out it did not pad correctly w.r.t. input features.
Steps to reproduce the bug
I tried the following code for padding investigation
Expected results
The code above is line-by-line breakdown of
DataCollatorCTCWithPadding
provided in the official blog. It should start to fine-tune Wav2Vec model.Actual results
I got a very similar error when I used
Trainer
Which returns:
Environment info
datasets
version: 1.18.3Any suggestions? Thanks in advance.