uci-cbcl / UFold

MIT License
58 stars 26 forks source link

Preprocessing in GPU #30

Closed Arun-42 closed 9 months ago

Arun-42 commented 9 months ago

When running inference on a GPU, the runtime is dominate by the preprocessing step - specifically creatmat function. The current implementation is not vectorized and runs on CPU. Although using worker threads in dataloader helps, it is still the main bottleneck.

I have thus implemented a vectorized creatmat function, that can be moved to GPU. This makes inference much faster (10x speedup on some tests I ran).

Speed

I ran inference on Colab with a T4 GPU and a 2-core CPU (Xeon) with 20 sequences each of length 600. The numbers below are roughly what i observed running multiple times, they fluctuate by ~10%.

Current implementation: 120s Vectorized GPU: 12s

One run through creatmat takes about 0.2s now compared to the earlier 4s.

Details

The implementation is the exact same operations as before. But it is numerically not exact due to floating point errors.

Comments

sperfu commented 9 months ago

Hi there,

We appreciate your contribution to the UFold project. Upon review, we have found your code to be a valuable addition and have subsequently integrated it into our main branch. Thank you once again for your valuable input.

Thanks.