mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.66k stars 426 forks source link

UnboundLocalError: local variable 'l1_loss' referenced before assignment in Doctr's compute_loss function #1661

Closed ep0p closed 3 months ago

ep0p commented 3 months ago

Bug description

While training a Doctr model with my own dataset, I encountered an UnboundLocalError in the compute_loss function of the differentiable_binarization module.

Code snippet to reproduce the bug

python references/detection/train_pytorch.py ../dataset_detection/train ../dataset_detection/val db_resnet50 --epochs 5 --device 0

Error traceback

Train set loaded in 63.19s (11871 samples in 5935 batches)
Training loss: 1.21584:   2%|█▊                                                                                  | 131/5935 [01:24<1:02:43,  1.54it/s]
Traceback (most recent call last):                                                                               | 131/5935 [01:24<1:02:04,  1.56it/s]
  File "/home/epop/dataset/doctr/references/detection/train_pytorch.py", line 481, in <module>
    main(args)
  File "/home/epop/dataset/doctr/references/detection/train_pytorch.py", line 388, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, amp=args.amp)
  File "/home/epop/dataset/doctr/references/detection/train_pytorch.py", line 126, in fit_one_epoch
    train_loss = model(images, targets)["loss"]
  File "/home/epop/anaconda3/envs/ds3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/epop/anaconda3/envs/ds3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/epop/dataset/doctr/doctr/models/detection/differentiable_binarization/pytorch.py", line 216, in forward
    loss = self.compute_loss(logits, thresh_map, target)
  File "/home/epop/dataset/doctr/doctr/models/detection/differentiable_binarization/pytorch.py", line 286, in compute_loss
    return l1_loss + focal_scale * focal_loss + dice_loss
UnboundLocalError: local variable 'l1_loss' referenced before assignment

Environment

Collecting environment information...

DocTR version: 0.9.0a0 TensorFlow version: N/A PyTorch version: 2.3.1+cu121 (torchvision 0.18.1+cu121) OpenCV version: 4.10.0 OS: Ubuntu 22.04.4 LTS Python version: 3.9.19 Is CUDA available (TensorFlow): N/A Is CUDA available (PyTorch): Yes CUDA runtime version: 11.5.119 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 555.42.02 cuDNN version: Could not collect

Deep Learning backend

>>> from doctr.file_utils import is_tf_available, is_torch_available
>>> print(f"is_tf_available: {is_tf_available()}")
is_tf_available: False
>>> print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True
felixdittrich92 commented 3 months ago

Hey @ep0p :wave:,

This will only happen if we couldn't build the target mask/s reasons for this:

ep0p commented 3 months ago

Hi @felixdittrich92,

I am in the process of checking my dataset but it will take a while, i'll come back to you as soon as i find the issue.