Closed stefanCCS closed 1 year ago
That's odd. It's probably caused by a wrong installation of Tensorflow or its (implicit) dependencies (esp. libcudnn). I cannot reproduce though.
Could you please show the results of (in your active venv):
pip show torch
pip show tensorflow
pip show nvidia-cudnn-cu11
ldconfig -p | grep cudnn
(If you are on ocrd_all in a native installation, perhaps you need to run make fix-cuda
, as is currently used in the Docker build. See respective comments in the Makefile for explanation.)
Here are the results:
(ocrd-3.8) gputest@linuxgputest2:~/ocrd_all$ pip show torch
Name: torch
Version: 1.13.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/gputest/ocrd-3.8/lib/python3.8/site-packages
Requires: nvidia-cublas-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, typing-extensions
Required-by: kraken, ocrd-anybaseocr, ocrd-detectron2, ocrd-typegroups-classifier, pix2pixhd, pytorch-lightning, torchmetrics, torchvision
(ocrd-3.8) gputest@linuxgputest2:~/ocrd_all$ pip show tensorflow
Name: tensorflow
Version: 2.12.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /home/gputest/ocrd-3.8/lib/python3.8/site-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, jax, keras, libclang, numpy, opt-einsum, packaging, protobuf, setuptools, six, tensorboard, tensorflow-estimator, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wrapt
Required-by: calamari-ocr, eynollah, ocrd-anybaseocr, ocrd-calamari, sbb-binarization
(ocrd-3.8) gputest@linuxgputest2:~/ocrd_all$ pip show nvidia-cudnn-cu11
Name: nvidia-cudnn-cu11
Version: 8.5.0.96
Summary: cuDNN runtime libraries
Home-page: https://developer.nvidia.com/cuda-zone
Author: Nvidia CUDA Installer Team
Author-email: cuda_installer@nvidia.com
License: NVIDIA Proprietary Software
Location: /home/gputest/ocrd-3.8/lib/python3.8/site-packages
Requires: nvidia-cublas-cu11
Required-by: torch
(ocrd-3.8) gputest@linuxgputest2:~/ocrd_all$ ldconfig -p | grep cudnn
libcudnn_ops_train.so.8 (libc6,x86-64) => /conda/lib/libcudnn_ops_train.so.8
libcudnn_ops_train.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_ops_train.so.8
libcudnn_ops_train.so (libc6,x86-64) => /conda/lib/libcudnn_ops_train.so
libcudnn_ops_train.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_ops_train.so
libcudnn_ops_infer.so.8 (libc6,x86-64) => /conda/lib/libcudnn_ops_infer.so.8
libcudnn_ops_infer.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8
libcudnn_ops_infer.so (libc6,x86-64) => /conda/lib/libcudnn_ops_infer.so
libcudnn_ops_infer.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_ops_infer.so
libcudnn_cnn_train.so.8 (libc6,x86-64) => /conda/lib/libcudnn_cnn_train.so.8
libcudnn_cnn_train.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8
libcudnn_cnn_train.so (libc6,x86-64) => /conda/lib/libcudnn_cnn_train.so
libcudnn_cnn_train.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_train.so
libcudnn_cnn_infer.so.8 (libc6,x86-64) => /conda/lib/libcudnn_cnn_infer.so.8
libcudnn_cnn_infer.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
libcudnn_cnn_infer.so (libc6,x86-64) => /conda/lib/libcudnn_cnn_infer.so
libcudnn_cnn_infer.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so
libcudnn_adv_train.so.8 (libc6,x86-64) => /conda/lib/libcudnn_adv_train.so.8
libcudnn_adv_train.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_adv_train.so.8
libcudnn_adv_train.so (libc6,x86-64) => /conda/lib/libcudnn_adv_train.so
libcudnn_adv_train.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_adv_train.so
libcudnn_adv_infer.so.8 (libc6,x86-64) => /conda/lib/libcudnn_adv_infer.so.8
libcudnn_adv_infer.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8
libcudnn_adv_infer.so (libc6,x86-64) => /conda/lib/libcudnn_adv_infer.so
libcudnn_adv_infer.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_adv_infer.so
libcudnn.so.8 (libc6,x86-64) => /conda/lib/libcudnn.so.8
libcudnn.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn.so.8
libcudnn.so (libc6,x86-64) => /conda/lib/libcudnn.so
libcudnn.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn.so
I have NOT run make fix-cuda
so far - should I do this now? (or any other recommendation based on the output from above?
I have NOT run
make fix-cuda
so far - should I do this now? (or any other recommendation based on the output from above?
Yes, please do. (TF 2.12 needs cudnn 8.6, but Torch 1.13 via ocrd_kraken pulled 8.5. We could also rerun the ocrd_detectron2 setup so we get Torch 2.0.1 and cudnn 8.6, but make fix-cuda
is probably the easiest and safest ATM.)
Looks good - make test-workflow
has run through.
I only see some ERROR
s for EvaluateLines
like this:
2023-06-26 09:31:23.443 ERROR processor.KrakenSegment - Line 89 could not be assigned a region, creating a dummy region
2023-06-26 09:40:48.800 ERROR processor.EvaluateLines - Line 'region_0006_line_0001' contains too short word/glyph sequence (9<10)
2023-06-26 09:40:50.298 ERROR processor.EvaluateLines - line "region_0017_line_0002" in file "OCR-D-OCR4_PHYS_0001" is missing from input 2
--> I assume, this is "just" something, which may happen in EvaluateLines
-processing depending on content/image. Therefore, software is ok.
Correct?
Yes, that happens all the time. The error is a result of Kraken's internal architecture. Hard to tell whether these are legitimate segmentation problems. But if test-workflow
completes, you are fine.
I will close this issue now.
Please tell me, if I should re-open for tracking a documentation issue to put somewhere the need of make fix-cuda
.
After installing
ocrd_all
Rel. v2023-06-14 I have calledmake test-workflow
. This leads to following error:--> please clarify ...