Closed vinx13 closed 2 years ago
This PR contains software pipelining for CUDA (without async memcpy). This is a working version but the code need polishment and more test cases
@junrushao1994 @jinhongyii @spectrometerHBH this is ready for review
This PR contains software pipelining for CUDA (without async memcpy). This is a working version but the code need polishment and more test cases