It seems the openmp locks and (dynamic,1) overhead can become important for machines with large numbers of cores. For decompression, I could see some improvements using static scheduling:
Perhaps there is a better way to overcome this problem? Anyway, I will try to send you a pull request for dicussion.
It seems the openmp locks and (dynamic,1) overhead can become important for machines with large numbers of cores. For decompression, I could see some improvements using static scheduling:
Perhaps there is a better way to overcome this problem? Anyway, I will try to send you a pull request for dicussion.