Closed oahzxl closed 2 years ago
Use chunk, code optimization and heterogeneous computing to reduce memory usage of long sequence inference.
can infer sequence 5000 on 80G A100 now
LGTM, support multimer model in new PR
Use chunk, code optimization and heterogeneous computing to reduce memory usage of long sequence inference.