when counts are different, we may have a situation where a reduction followed by a send may not be in sync, especially for LL protocol. This commit makes sure that there is a syncthreads between instructions within the same threadblock regardless of hasdep filed.
when counts are different, we may have a situation where a reduction followed by a send may not be in sync, especially for LL protocol. This commit makes sure that there is a syncthreads between instructions within the same threadblock regardless of hasdep filed.