Problem with Laypa segementation + GPU

fattynoparents commented 7 months ago

Hi, when I try to run the very first round of image recognition using test models suggested in the README, it works fine (even if slow) with my CPU. Here's some of the output of the correctly working code:

/tmp/tmp.iTZpkcF0rX
starting Laypa baseline detection
docker run --rm -it -u 1000:1000 -m 32000m --shm-size 10240m -v /home/user/laypa/general/baseline:/home/user/laypa/general/baseline -v /home/user/images:/home/user/images -v /home/user/images:/home/user/images loghi/docker.laypa:1.3.10 python run.py -c /home/user/laypa/general/baseline/config.yaml -i /home/user/images -o /home/user/images --opts MODEL.WEIGHTS  TEST.WEIGHTS /home/user/laypa/general/baseline/model_best_mIoU.pth
DeprecationWarning PREPROCESS.RESIZE.USE is losing support; please switch to PREPROCESS.RESIZE.RESIZE_MODE
INPUT.SCALING_TEST is not set, inferring from INPUT.SCALING_TRAIN and PREPROCESS.RESIZE.SCALING to be 0.5
[02/14 09:49:25 detectron2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /home/user/laypa/general/baseline/model_best_mIoU.pth ...
[02/14 09:49:25 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/user/laypa/general/baseline/model_best_mIoU.pth ...
/opt/conda/envs/laypa/lib/python3.12/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Predicting PageXML: 100% 2/2 [30:57<00:00, 928.92s/it]
72 [pool-1-thread-2] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - /home/user/images/page/i2.png
72 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - /home/user/images/page/i1.png
130 [pool-1-thread-2] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - FOUND LABELS:42
139 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - FOUND LABELS:26
294 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - mergedLineDetected: /home/user/images/page/i1.png
419 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - mergedLineDetected: /home/user/images/page/i1.png
442 [pool-1-thread-2] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - mergedLineDetected: /home/user/images/page/i2.png
464 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.BaselinesMapper - Mapping lines took: 5.640 ms
471 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - textlines to match: 25 /home/user/images/page/i1.png
475 [pool-1-thread-2] INFO nl.knaw.huc.di.images.minions.BaselinesMapper - Mapping lines took: 1.187 ms
475 [pool-1-thread-2] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - textlines to match: 41 /home/user/images/page/i2.png

Textlines to match are greater than 0, which leads to correct segmentation and, consequently, correct work of loghi-htr.

However, when I try to run it using my GPU (Nvidia Geforce GTX 1650 Ti) the Laypa part fails to recognize lines for the segmentation. Here's some of the ouput:

/tmp/tmp.LmlQotvuKh
using GPU 0
starting Laypa baseline detection
docker run --gpus device=0 --rm -it -u 1000:1000 -m 32000m --shm-size 10240m -v /home/user/laypa/general/baseline:/home/user/laypa/general/baseline -v /home/user/images:/home/user/images -v /home/user/images:/home/user/images loghi/docker.laypa:1.3.10 python run.py -c /home/user/laypa/general/baseline/config.yaml -i /home/user/images -o /home/user/images --opts MODEL.WEIGHTS  TEST.WEIGHTS /home/user/laypa/general/baseline/model_best_mIoU.pth MODEL.AMP_TEST.PRECISION float16
DeprecationWarning PREPROCESS.RESIZE.USE is losing support; please switch to PREPROCESS.RESIZE.RESIZE_MODE
INPUT.SCALING_TEST is not set, inferring from INPUT.SCALING_TRAIN and PREPROCESS.RESIZE.SCALING to be 0.5
[02/14 10:42:53 detectron2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /home/user/laypa/general/baseline/model_best_mIoU.pth ...
[02/14 10:42:53 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/user/laypa/general/baseline/model_best_mIoU.pth ...
/opt/conda/envs/laypa/lib/python3.12/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Predicting PageXML: 100% 2/2 [00:02<00:00,  1.25s/it]
80 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - /home/user/images/page/i1.png
81 [pool-1-thread-2] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - /home/user/images/page/i2.png
140 [pool-1-thread-2] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - FOUND LABELS:1
147 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - FOUND LABELS:1
333 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.BaselinesMapper - Mapping lines took: 2.653 ms
333 [pool-1-thread-2] INFO nl.knaw.huc.di.images.minions.BaselinesMapper - Mapping lines took: 2.638 ms
341 [pool-1-thread-1] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - textlines to match: 0 /home/user/images/page/i1.png
341 [pool-1-thread-2] INFO nl.knaw.huc.di.images.minions.MinionExtractBaselines - textlines to match: 0 /home/user/images/page/i2.png
errors: 0

Textlines to match are always 0. The only thing I had to change in the na-pipeline.sh file in the Laypa recognition is this -MODEL.AMP_TEST.PRECISION float16 because otherwise I got an error that my system doesn't support bfloat16 and a suggestion to switch to float16.

Could someone please help me with that? Thanks a lot in advance.

stefanklut commented 7 months ago

Hi, thanks for your interest. The reason you are seeing all 0 is likely due to there being NaN values in the output when using float16. We chose to use bfloat16 because it has the same range for values as float32. The current public model was trained using float32. Where the NaN values come from exactly is not quite clear to me yet, but to help your problem simply disable automatic mixed precision (AMP), by doing either MODEL.AMP_TEST.PRECISION float32 or MODEL.AMP_TEST.ENABLED False.

This should give you the expected output on the GPU. With a minor performance hit in speed and memory usage (compared to float16). But almost certainly still a massive speedup over running on CPU.

stefanklut commented 7 months ago

Let me know if this issue persists with float32, otherwise please change the title so others with the same problem can more easily find this issue. I will also update the docs when I have time

fattynoparents commented 7 months ago

Thanks a lot for the quick reply, setting to float32 has solved the issue.

knaw-huc / loghi

Problem with Laypa segementation + GPU #20