RuntimeError: Could not infer dtype of NoneType

aminatadjer commented 3 years ago

Hi, I am trying to run CodeGPT (for text to code task), I followed exactly the same steps but I am getting this error: Traceback (most recent call last): File "run.py", line 653, in main() File "run.py", line 640, in main global_step, tr_loss = train(args, train_dataset, model, tokenizer, fh, pool) File "run.py", line 165, in train for step, (batch, token_labels) in enumerate(train_dataloader): File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/at5262/CodeXGLUE/Text-Code/text-to-code/code/dataset.py", line 116, in getitem return torch.tensor(self.inputs[item]), torch.tensor(self.token_labels[item]) RuntimeError: Could not infer dtype of NoneType Traceback (most recent call last): File "/home/at5262/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/at5262/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in main() File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/at5262/anaconda3/bin/python', '-u', 'run.py', '--local_rank=0', '--data_dir=../dataset/concode', '--langs=java', '--output_dir=../save/concode', '--pretrain_dir=microsoft/CodeGPT-small-java-adaptedGPT2', '--log_file=text2code_concode.log', '--model_type=gpt2', '--block_size=512', '--do_train', '--node_index', '0', '--gpu_per_node', '1', '--learning_rate=5e-5', '--weight_decay=0.01', '--evaluate_during_training', '--per_gpu_train_batch_size=6', '--per_gpu_eval_batch_size=12', '--gradient_accumulation_steps=2', '--num_train_epochs=30', '--logging_steps=100', '--save_steps=5000', '--overwrite_output_dir', '--seed=42']' returned non-zero exit status 1.

I tried to run on Colab, and I am getting the same error

celbree commented 3 years ago

Hi, I can't reproduce the error. This error message indicts that there seems exist None during tokenizer processing. Would you please check if you are running the newest version of code with pytorch >= 1.4.0, transformers >= 2.5.0 and <= 4.0.0?

BitcoinNLPer commented 3 years ago

I have the same error

BitcoinNLPer commented 3 years ago

I have solved this bug. Updating transformers==2.5.0 to 4.0.0, it seems ok.

2003pro commented 2 years ago

I reproduced this bug in text-to-code task. In my case, updating transformers does not work. [Quick Solution] 1. Go to check the folder save/concode 2. Remove all the files begin with "train_blocksize_" 3. Rerun the training script [Explaination] I dived into details about the procedure and find that it is the preprocessing steps. The reason lies in reusing preprocessing files in "save/concode". If there occurs error in generating preprocessed files in the beginning, like interpreting preprocessing steps in accident, you will get 2 (according to the default para) false files in "save/concode". Later, if you do not remove them, you will keep getting errors when calling "data = [self.dataset[idx] for idx in possibly_batched_index]" at "File "/home/at5262/CodeXGLUE/Text-Code/text-to-code/code/dataset.py", line 116, in getitem". If you remove these false files and get preprocessing steps rerun completely, the correct preprocessed files will be generated and the whole script can run normally.

microsoft / CodeXGLUE

RuntimeError: Could not infer dtype of NoneType #72