microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.57k stars 366 forks source link

RuntimeError: Could not infer dtype of NoneType #72

Closed aminatadjer closed 3 years ago

aminatadjer commented 3 years ago

Hi, I am trying to run CodeGPT (for text to code task), I followed exactly the same steps but I am getting this error: Traceback (most recent call last): File "run.py", line 653, in main() File "run.py", line 640, in main global_step, tr_loss = train(args, train_dataset, model, tokenizer, fh, pool) File "run.py", line 165, in train for step, (batch, token_labels) in enumerate(train_dataloader): File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/at5262/CodeXGLUE/Text-Code/text-to-code/code/dataset.py", line 116, in getitem return torch.tensor(self.inputs[item]), torch.tensor(self.token_labels[item]) RuntimeError: Could not infer dtype of NoneType Traceback (most recent call last): File "/home/at5262/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/at5262/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in main() File "/home/at5262/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/at5262/anaconda3/bin/python', '-u', 'run.py', '--local_rank=0', '--data_dir=../dataset/concode', '--langs=java', '--output_dir=../save/concode', '--pretrain_dir=microsoft/CodeGPT-small-java-adaptedGPT2', '--log_file=text2code_concode.log', '--model_type=gpt2', '--block_size=512', '--do_train', '--node_index', '0', '--gpu_per_node', '1', '--learning_rate=5e-5', '--weight_decay=0.01', '--evaluate_during_training', '--per_gpu_train_batch_size=6', '--per_gpu_eval_batch_size=12', '--gradient_accumulation_steps=2', '--num_train_epochs=30', '--logging_steps=100', '--save_steps=5000', '--overwrite_output_dir', '--seed=42']' returned non-zero exit status 1.

I tried to run on Colab, and I am getting the same error

celbree commented 3 years ago

Hi, I can't reproduce the error. This error message indicts that there seems exist None during tokenizer processing. Would you please check if you are running the newest version of code with pytorch >= 1.4.0, transformers >= 2.5.0 and <= 4.0.0?

BitcoinNLPer commented 3 years ago

I have the same error

BitcoinNLPer commented 3 years ago

I have solved this bug. Updating transformers==2.5.0 to 4.0.0, it seems ok.

2003pro commented 2 years ago

I reproduced this bug in text-to-code task. In my case, updating transformers does not work. [Quick Solution] 1. Go to check the folder save/concode 2. Remove all the files begin with "train_blocksize_" 3. Rerun the training script [Explaination] I dived into details about the procedure and find that it is the preprocessing steps. The reason lies in reusing preprocessing files in "save/concode". If there occurs error in generating preprocessed files in the beginning, like interpreting preprocessing steps in accident, you will get 2 (according to the default para) false files in "save/concode". Later, if you do not remove them, you will keep getting errors when calling "data = [self.dataset[idx] for idx in possibly_batched_index]" at "File "/home/at5262/CodeXGLUE/Text-Code/text-to-code/code/dataset.py", line 116, in getitem". If you remove these false files and get preprocessing steps rerun completely, the correct preprocessed files will be generated and the whole script can run normally.