Example:
at begin: CPU has complete copy, GPU 0 has dirty subarray
then: GPU 0 read the whole array
what happen: GPU 0 subarray write back to CPU, then GPU 0 copy back from CPU
previous logic has a deadlock that CPU wait on GPU 0 to be ready and GPU 0 wait on CPU to be ready.
Example: at begin: CPU has complete copy, GPU 0 has dirty subarray
then: GPU 0 read the whole array what happen: GPU 0 subarray write back to CPU, then GPU 0 copy back from CPU previous logic has a deadlock that CPU wait on GPU 0 to be ready and GPU 0 wait on CPU to be ready.