knaw-huc / loghi

MIT License
97 stars 13 forks source link

Performance estimates - how much time per scan? #9

Closed Simon-Dirks closed 11 months ago

Simon-Dirks commented 11 months ago

Hi there,

I'm running on old hardware to test and evaluate the pipeline. With my dated system (Windows 10 with WSL2, GTX970, Ryzen 5 3600, 16GB DDR4 RAM, M.2 Samsung 970 EVO Plus), I'm getting the following performance on a single run with 40 scans.

phase time_in_ms time_in_secs time_in_mins
laypa_baseline_detection 25469 25,47 0,42
loghi_htr 457460 457,46 7,62
detecting_language 4007 4,01 0,07
MinionSplitPageXMLTextLineIntoWords 1998 2,00 0,03

The entire script for 40 scans takes 8.15 minutes for me (489014ms).

I was wondering what performance increase I can (more or less) expect when upgrading to modern hardware (e.g., RTX2070 or better).

If anyone in the community (or the dev) would like to share their machine's performance metrics that'd be of great help! Quick ballpark estimates would already have great value.

stefanklut commented 11 months ago

It seems to me that loghi-htr is running much slower than expected, could you verify that it is running on GPU? I believe we had issues with WSL and gpu. Maybe you could try the docker pull loghi/docker.htr-wsl:1.2 and change the docker used in the pipeline to this experimental WSL version. We are currently developing on Ubuntu, so we haven't run actively on WSL

Simon-Dirks commented 11 months ago

It seems to me that loghi-htr is running much slower than expected, could you verify that it is running on GPU? I believe we had issues with WSL and gpu. Maybe you could try the docker pull loghi/docker.htr-wsl:1.2 and change the docker used in the pipeline to this experimental WSL version. We are currently developing on Ubuntu, so we haven't run actively on WSL

I'm pretty sure it was running on the GPU! I logged GPU load which reached 99% at times, and I saw some CUDA logging as well. Couldn't it just be the dated hardware? GTX970 was released in 2014 after all..

I'll have a closer look some time next week, what ballpark range of performance should I be expecting with GPU?

rvankoert commented 11 months ago

Hi Simon,

Besides the WSL-issues some extra info:

Performance depends a bit on input. Scans with lots of textlines take more time and longer textlines are slower to process than short textlines.

There is a parameter you can add to speed up the actual HTR part: --greedy look in the na-pipeline.sh for the line that contains --beamwidth 10 and add the --greedy there so it looks like: --greedy --beamwidth 10

Alternatively you can also lower the beamwidth. Lowering the beamwidth makes things slightly less accurate, but much faster. The greedy parameter is effectively the same as --beamwidth 1, but with some extra speedups.

I can process about 50.000 scans of single page 18th century material per day on an high-end laptop (i9, 64GB + 3080TI mobile GPU)

Simon-Dirks commented 11 months ago

Hi Simon,

Besides the WSL-issues some extra info:

Performance depends a bit on input. Scans with lots of textlines take more time and longer textlines are slower to process than short textlines.

There is a parameter you can add to speed up the actual HTR part: --greedy look in the na-pipeline.sh for the line that contains --beamwidth 10 and add the --greedy there so it looks like: --greedy --beamwidth 10

Alternatively you can also lower the beamwidth. Lowering the beamwidth makes things slightly less accurate, but much faster. The greedy parameter is effectively the same as --beamwidth 1, but with some extra speedups.

I can process about 50.000 scans of single page 18th century material per day on an high-end laptop (i9, 64GB + 3080TI mobile GPU)

Thanks a lot for these additional pointers! I just did a run on Ubuntu on the exact same machine/hardware. Performance differences do not seem to be major, especially considering that I just did a single run. The slower-than-expected performance is probably simply hardware-related. Overview in table below:

phase ubuntu_time_in_secs windows_time_secs ubuntu_time_in_mins windows_time_in_mins
laypa_baseline_detection 22.32 25.47 0.37 0.42
loghi_htr 421.72 457.46 7.03 7.62
detecting_language 3.01 4.01 0.05 0.07
MinionSplitPageXMLTextLineIntoWords 1.43 2.00 0.02 0.03
Script time total 448.58 489.01 7.48 8.15

Also note that I receive the following message/warning, which might or might not be related:

Number of devices: 1
using mixed_float16
WARNING:tensorflow:Mixed precision compatibility check (mixed_float16): WARNING
Your GPU may run slowly with dtype policy mixed_float16 because it does not have compute capability of at least 7.0. Your GPU:
  NVIDIA GeForce GTX 970, compute capability 5.2
See https://developer.nvidia.com/cuda-gpus for a list of GPUs and their compute capabilities.
If you will use compatible GPU(s) not attached to this host, e.g. by running a multi-worker model, you can ignore this warning. This message will only be logged once
rvankoert commented 11 months ago

The mixed precision warning is definitely related. Please try to use the model that starts with float32. It should run faster on your hardware.

Simon-Dirks commented 11 months ago

I tried running with the float32 model but get the same warning unfortunately.