improve phototour getitem speed considerably

the original implementation is very slow, the problem lies in the following code: self.labels==pair_ids[0] This is a linear search of randomly selected label id in a huge array, and this is done for every data item in each batch. My machine is 24 cores i7-6800k + 32 GB RAM + Nvidia 1070Ti 8GB RAM. With the old code, the training script spends most of its time fetching data, GPU usage is nearly 0%, even with 8 parallel workers in DataLoader. the processing speed is only 4.5 batch/sec. After the fix, with only 1 worker in DataLoader, it processes 27 batch/sec, and GPU usage stays steadily at 50%.

vbalnt / tfeat

improve phototour getitem speed considerably #17