Closed Simon-Dirks closed 11 months ago
It seems to me that loghi-htr is running much slower than expected, could you verify that it is running on GPU? I believe we had issues with WSL and gpu. Maybe you could try the docker pull loghi/docker.htr-wsl:1.2
and change the docker used in the pipeline to this experimental WSL version. We are currently developing on Ubuntu, so we haven't run actively on WSL
It seems to me that loghi-htr is running much slower than expected, could you verify that it is running on GPU? I believe we had issues with WSL and gpu. Maybe you could try the
docker pull loghi/docker.htr-wsl:1.2
and change the docker used in the pipeline to this experimental WSL version. We are currently developing on Ubuntu, so we haven't run actively on WSL
I'm pretty sure it was running on the GPU! I logged GPU load which reached 99% at times, and I saw some CUDA logging as well. Couldn't it just be the dated hardware? GTX970 was released in 2014 after all..
I'll have a closer look some time next week, what ballpark range of performance should I be expecting with GPU?
Hi Simon,
Besides the WSL-issues some extra info:
Performance depends a bit on input. Scans with lots of textlines take more time and longer textlines are slower to process than short textlines.
There is a parameter you can add to speed up the actual HTR part:
--greedy
look in the na-pipeline.sh for the line that contains
--beamwidth 10
and add the --greedy
there so it looks like:
--greedy --beamwidth 10
Alternatively you can also lower the beamwidth. Lowering the beamwidth makes things slightly less accurate, but much faster. The greedy parameter is effectively the same as --beamwidth 1, but with some extra speedups.
I can process about 50.000 scans of single page 18th century material per day on an high-end laptop (i9, 64GB + 3080TI mobile GPU)
Hi Simon,
Besides the WSL-issues some extra info:
Performance depends a bit on input. Scans with lots of textlines take more time and longer textlines are slower to process than short textlines.
There is a parameter you can add to speed up the actual HTR part:
--greedy
look in the na-pipeline.sh for the line that contains--beamwidth 10
and add the--greedy
there so it looks like:--greedy --beamwidth 10
Alternatively you can also lower the beamwidth. Lowering the beamwidth makes things slightly less accurate, but much faster. The greedy parameter is effectively the same as --beamwidth 1, but with some extra speedups.
I can process about 50.000 scans of single page 18th century material per day on an high-end laptop (i9, 64GB + 3080TI mobile GPU)
Thanks a lot for these additional pointers! I just did a run on Ubuntu on the exact same machine/hardware. Performance differences do not seem to be major, especially considering that I just did a single run. The slower-than-expected performance is probably simply hardware-related. Overview in table below:
phase | ubuntu_time_in_secs | windows_time_secs | ubuntu_time_in_mins | windows_time_in_mins |
---|---|---|---|---|
laypa_baseline_detection | 22.32 | 25.47 | 0.37 | 0.42 |
loghi_htr | 421.72 | 457.46 | 7.03 | 7.62 |
detecting_language | 3.01 | 4.01 | 0.05 | 0.07 |
MinionSplitPageXMLTextLineIntoWords | 1.43 | 2.00 | 0.02 | 0.03 |
Script time total | 448.58 | 489.01 | 7.48 | 8.15 |
Also note that I receive the following message/warning, which might or might not be related:
Number of devices: 1
using mixed_float16
WARNING:tensorflow:Mixed precision compatibility check (mixed_float16): WARNING
Your GPU may run slowly with dtype policy mixed_float16 because it does not have compute capability of at least 7.0. Your GPU:
NVIDIA GeForce GTX 970, compute capability 5.2
See https://developer.nvidia.com/cuda-gpus for a list of GPUs and their compute capabilities.
If you will use compatible GPU(s) not attached to this host, e.g. by running a multi-worker model, you can ignore this warning. This message will only be logged once
The mixed precision warning is definitely related. Please try to use the model that starts with float32. It should run faster on your hardware.
I tried running with the float32 model but get the same warning unfortunately.
Hi there,
I'm running on old hardware to test and evaluate the pipeline. With my dated system (Windows 10 with WSL2, GTX970, Ryzen 5 3600, 16GB DDR4 RAM, M.2 Samsung 970 EVO Plus), I'm getting the following performance on a single run with 40 scans.
The entire script for 40 scans takes 8.15 minutes for me (489014ms).
I was wondering what performance increase I can (more or less) expect when upgrading to modern hardware (e.g., RTX2070 or better).
If anyone in the community (or the dev) would like to share their machine's performance metrics that'd be of great help! Quick ballpark estimates would already have great value.