Default training command - Issues when encountering documents longer than 512

Running the command under the training section of the README, the program fails in the first optimization step with the following message:

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [535,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

Which is thrown from the following:

  ...                                                                                                                                        
  File "/task_runtime/src/transformers-4.2.1/src/transformers/models/bert/modeling_bert.py", line 956, in forward                                                                           
    past_key_values_length=past_key_values_length,   
  ...
  File "/miniconda/lib/python3.7/site-packages/torch/nn/functional.py", line 2043, in embedding                                                                                                 
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)                                                                                            
       RuntimeError: CUDA error: device-side assert triggered

In other words, the base model (bert-base-cased) encounters an input with a larger sequence length than what it can handle (535 > 512).

Given the above, how do you get around it, and apply your method to entire documents? (i.e., the MS MARCO Document Ranking table)

luyug / COIL

Default training command - Issues when encountering documents longer than 512 #7