the original implementation is very slow, the problem lies in the following code:
self.labels==pair_ids[0]
This is a linear search of randomly selected label id in a huge array, and this is done for every data item in each batch.
My machine is 24 cores i7-6800k + 32 GB RAM + Nvidia 1070Ti 8GB RAM. With the old code, the training script spends most of its time fetching data, GPU usage is nearly 0%, even with 8 parallel workers in DataLoader. the processing speed is only 4.5 batch/sec.
After the fix, with only 1 worker in DataLoader, it processes 27 batch/sec, and GPU usage stays steadily at 50%.
the original implementation is very slow, the problem lies in the following code: self.labels==pair_ids[0] This is a linear search of randomly selected label id in a huge array, and this is done for every data item in each batch. My machine is 24 cores i7-6800k + 32 GB RAM + Nvidia 1070Ti 8GB RAM. With the old code, the training script spends most of its time fetching data, GPU usage is nearly 0%, even with 8 parallel workers in DataLoader. the processing speed is only 4.5 batch/sec. After the fix, with only 1 worker in DataLoader, it processes 27 batch/sec, and GPU usage stays steadily at 50%.