What is the known working GPU config?

qurator-spk / eynollah

Document Layout Analysis

Apache License 2.0

340 stars 29 forks source link

What is the known working GPU config? #84

Closed ghost closed 1 year ago

ghost commented 2 years ago

I am using an Amazon pressed Ubuntu 16 Deep Learning AMI which contains CUDA 10, 10.1, 10.2, and 11.

I am using Mambaforge with Python 3.6 or 3.7

Tensorflow 2 is automatically used. I plan to try Tensorflow 1.x next.

The process is loaded into GPU memory, but the GPU is never used.

Is there a known working full stack config for eynollah on the GPU (OS+version, CUDA+version, Python+version, Tensorflow+version, etc) that you don't mind sharing?

Thanks,

cneud commented 2 years ago

Hi @mach881040, for me it works well with NVIDIA 2070S GPU on Ubuntu 18.04, Python 3.7, Tensorflow 2.4.1 and CUDA 10.1. Note that there is also still a lot of room for improvement wrt GPU utilization - we hope to optimize this, but for our use case quality of results is much more important than throughput speed.

bertsky commented 1 year ago

The process is loaded into GPU memory, but the GPU is never used.

I can confirm this with Ubuntu 22.04, Python 3.8, TF 2.10. It's not about low utilisation. The OP says no utilisation, and that's what I see, too. The memory consumption is only 107 MB (and not increasing), GPU util is never anything other than 0%.

bertsky commented 1 year ago

Sorry, error on my part. Cause was an insufficient CUDA/TF installation. I probably ran into #72 as well.

(I am on CUDA 11.7 though, and now it does work. So the note in the Readme might not be correct.)

bertsky commented 1 year ago

BTW, is there a particular reason for keeping the TF1-style session management? I found that if I remove it completely (including the explicit GC calls), and avoid repeating load_model calls by storing the model refs in Eynollah's instance, it gets about 9% faster on average (while max RSS of course does increase from 4 GB to 7 GB).

cneud commented 1 year ago

BTW, is there a particular reason for keeping the TF1-style session management? I found that if I remove it completely (including the explicit GC calls), and avoid repeating load_model calls by storing the model refs in Eynollah's instance, it gets about 9% faster on average (while max RSS of course does increase from 4 GB to 7 GB).

This should already be fixed with https://github.com/qurator-spk/eynollah/commit/7345f6bf678f36cf3a51576b0fa94df0919925d7 (which has since been merged), right?

The working config for (limited) GPU use is now documented in the README.