Adopt CUDA UVM support from recent tensorflow versions to alleviate memory constraints

latitudegames / AIDungeon

Infinite adventures await!

http://www.aidungeon.io/

MIT License

3.19k stars 553 forks source link

Adopt CUDA UVM support from recent tensorflow versions to alleviate memory constraints #174

Open subatomicBMAN opened 4 years ago

subatomicBMAN commented 4 years ago

As per these two commits to tensorflow:

https://github.com/tensorflow/tensorflow/commit/cd4f5840 https://github.com/tensorflow/tensorflow/commit/b1139814

Support has now been extended for CUDA Unified Virtual Memory. If the intent behind this repository is to allow for consumer-grade hardware to run this in a closer-to-deployed state, running on the GPU would be IDEAL. Unified Virtual Memory can allow system-level RAM to be consumed alongside GPU VRAM in order to accomodate larger in-memory constructs. This of course comes at a cost (seeing that system RAM is typically MUCH slower than VRAM) however with a small benchmark this could be weighed against the benefits of running in CUDA parallelization.

I am of the opinion that this would allow for a much larger vector of the target demographic to run this repository at more reasonable processing speeds.

dyc3 commented 4 years ago

Interesting... I think it would be worth a shot to implement this.

kik4444 commented 4 years ago

If the resulting processing speed is faster than running it entirely in system ram, then I think this is definitely worth it. This looks to be exactly what I was asking in #121

ben-bay commented 4 years ago

This looks like it would require an upgrade to Tensorflow 2

subatomicBMAN commented 4 years ago

As far as I can tell it certainly would need an upgrade. I'm sure it would take at least some time to do it right, but the Tensorflow team themselves have published tooling around an easier migration from 1.14+ to 2.0.

See the following: https://www.tensorflow.org/guide/upgrade Applying the auto-upgrade might be enough to at least test the merit of this as an option before digging in and fully implementing.

It still might be worthwhile to build a standalone benchmark workload in 2.0 to test the capabilities and performance tax of UVM of course.

subatomicBMAN commented 4 years ago

I haven't had much of a chance to delve into the codebase at large yet, but if lower-level Tensorflow functionality has been used to write custom training then there could be considerably more work. The deeper differences and suggested updates are discussed here: https://www.tensorflow.org/guide/migrate

kylemiller3 commented 4 years ago

This would be amazing. Thanks for looking into this.