Redundant mesh grid creation for VSI

The local approach with caching the mesh grid was tested. While it halves the number of mesh grid generations, the performance improvement is marginal.

The upgraded version is available #342, but I'm not sure if it is worth merging it. @snk4tr @zakajd, what do you think?

Outcome

4 streams, no GPU

python tests/results_benchmark.py --dataset tid2013 --metrics VSI --path ../iqa_datasets/datasets/tid2013 --device cpu

# master
100%|█████████| 3000/3000 [08:01<00:00,  6.23it/s]
VSI: SRCC 0.895 KRCC 0.716

# fix/optimise_meshgrid
100%|█████████| 3000/3000 [05:53<00:00,  8.50it/s]
VSI: SRCC 0.895 KRCC 0.716

python tests/results_benchmark.py --dataset pipal --metrics VSI --path ../iqa_datasets/datasets/pipal --device cpu
# master
100%|███████| 23200/23200 [32:04<00:00, 12.06it/s]
VSI: SRCC 0.539 KRCC 0.375

# fix/optimise_meshgrid
100%|███████| 23200/23200 [33:45<00:00, 11.45it/s]
VSI: SRCC 0.539 KRCC 0.375

40 streams, Tesla V100

python tests/results_benchmark.py --dataset pipal --metrics VSI --path ./data/pipal --batch_size 16 --device coda
# Master CUDA
100%|█████████| 1450/1450 [00:42<00:00, 34.02it/s]
VSI: SRCC 0.539 KRCC 0.375

# Updated CUDA
100%|█████████| 1450/1450 [00:38<00:00, 37.72it/s]
VSI: SRCC 0.539 KRCC 0.375

python tests/results_benchmark.py --dataset pipal --metrics VSI --path ./data/pipal --batch_size 16 --device cpu
# Master CPU
100%|█████████| 1450/1450 [06:15<00:00,  3.87it/s]
VSI: SRCC 0.539 KRCC 0.375

# Updated CPU
100%|█████████| 1450/1450 [06:16<00:00,  3.86it/s]
VSI: SRCC 0.539 KRCC 0.375

photosynthesis-team / piq

Redundant mesh grid creation for VSI #335

Outcome

4 streams, no GPU

40 streams, Tesla V100