Open arachny opened 4 years ago
Any update on this issue. It will be helpful if version information can be added to requirements.txt
I'm having the same issue:
`06/24/2020 13:31:35 - INFO - transformers.tokenization_utils - loading file None 06/24/2020 13:31:35 - INFO - root - Number of GPUs: 4 06/24/2020 13:31:35 - INFO - root - label columns: ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate'] Exception during training: 'tuple' object has no attribute 'cls_token' Traceback (most recent call last): File "/opt/ml/code/train", line 157, in train logger=logger, File "/opt/conda/lib/python3.7/site-packages/fast_bert/data_cls.py", line 424, in init train_examples, "train", no_cache=self.no_cache File "/opt/conda/lib/python3.7/site-packages/fast_bert/data_cls.py", line 550, in get_dataset_from_examples cls_token=self.tokenizer.cls_token, AttributeError: 'tuple' object has no attribute 'cls_token'
UnexpectedStatusException Traceback (most recent call last)
Pull request created to fix this: #239
I have been trying to get this to work for several days now and keep on getting errors every time. I tried building the container image on my mac and on an AWS p3.8xlarge instance, but failed each time.
So this is the error I am getting if I just build the docker image as is (latest version) and then try it on the toxic-comments dataset, as is described exactly in the blog.
It appears that this error relates to AutoTokenizer, so I even tried to replace that to how the old version had it (RobertaTokenizer, etc.), but i was then getting a different error: "Caught StopIteration in replica 0 on device 0."
I tried to install fast-bert and apex directly into a notebook (with GPU) and then run the training and prediction code locally (i.e. not in the container via sagemaker) and it all runs fine!
Any ideas on what can be going wrong with that container?