hsiehjackson / Mr.Right

Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text
Creative Commons Attribution Share Alike 4.0 International
18 stars 3 forks source link

Total number of batches being processed per dataloader in validation_step #2

Open arvind-27 opened 1 year ago

arvind-27 commented 1 year ago

Hello, Due to resource limitations, I am trying to run my code with batch size 8. The problem arises when pytorch_lightning calls validation_step because it does not call for all the batches of either val_queries or multimodal_documents(only 2 batches for both are called). Because of this, when validation_epoch_end is processed, the total number of documents is 16, and when the code tries to make one hot encoding in line 249 of pltrainer.py, it throws a CUDA error because the number of docs is only 16 while the queries doc ids are ranging from 20,000-100,000. I have tried different batch sizes for queries dataloader and documents dataloader but nothing works. Can you tell if I am doing something wrong or something needs to be done differently.