mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.91k stars 445 forks source link

UnboundLocalError: local variable 'l1_loss' referenced before assignment #1738

Closed johnlockejrr closed 1 month ago

johnlockejrr commented 1 month ago

Bug description

While training a Doctr model with my own dataset, I encountered an UnboundLocalError in the compute_loss function of the differentiable_binarization module.

Code snippet to reproduce the bug

python references/detection/train_pytorch.py datasets/sam/train_out datasets/sam/val_out db_resnet50 --epochs 5 --device 0

Error traceback

Namespace(train_path='datasets/sam/train_out', val_path='datasets/sam/val_out', arch='db_resnet50', name=None, epochs=5, batch_size=2, device=0, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=None, resume=None, test_only=False, freeze_backbone=False, show_samples=False, wb=False, push_to_hub=False, pretrained=False, rotation=False, eval_straight=False, sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.1627s (67 samples in 34 batches)
Train set loaded in 0.1016s (540 samples in 270 batches)
  0%|                                                                                                                                                                                                                                                           | 0/270 [00:03<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                                              | 0/270 [00:00<?, ?it/s]
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 481, in <module>
    main(args)
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 388, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, amp=args.amp)
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 126, in fit_one_epoch
    train_loss = model(images, targets)["loss"]
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/incognito/doctr/doctr/models/detection/differentiable_binarization/pytorch.py", line 216, in forward
    loss = self.compute_loss(logits, thresh_map, target)
  File "/home/incognito/doctr/doctr/models/detection/differentiable_binarization/pytorch.py", line 286, in compute_loss
    return l1_loss + focal_scale * focal_loss + dice_loss
UnboundLocalError: local variable 'l1_loss' referenced before assignment

Environment

DocTR version: 0.9.1a0 TensorFlow version: N/A PyTorch version: 2.4.1+cu121 (torchvision 0.19.1+cu121) OpenCV version: 4.10.0 OS: Ubuntu 22.04.5 LTS Python version: 3.10.12 Is CUDA available (TensorFlow): N/A Is CUDA available (PyTorch): Yes CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Nvidia driver version: 561.09 cuDNN version: Could not collect

Deep Learning backend

from doctr.file_utils import is_tf_available, is_torch_available print(f"is_tf_available: {is_tf_available()}") is_tf_available: False print(f"is_torch_available: {is_torch_available()}") is_torch_available: True

johnlockejrr commented 1 month ago

If needed, I can upload my dataset.

johnlockejrr commented 1 month ago

I tested this on a different environment and I get the same error:

DocTR version: 0.9.1a0
TensorFlow version: N/A
PyTorch version: 2.4.1+cu121 (torchvision 0.19.1+cu121)
OpenCV version: 4.10.0
OS: Ubuntu 22.04.5 LTS
Python version: 3.10.12
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4070
Nvidia driver version: 560.94
cuDNN version: Could not collect

Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from doctr.file_utils import is_tf_available, is_torch_available
>>> print(f"is_tf_available: {is_tf_available()}")
is_tf_available: False
>>> print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True
>>>
johnlockejrr commented 1 month ago

I think I made a mistake, I just realized I used polygons from original images and the images in dataset were mogrified... checking

johnlockejrr commented 1 month ago

Working now, with big images much slower. What height or width would be recommanded for training?

(env-py3.10) incognito@DESKTOP-NHKR7QL:~/doctr$ python references/detection/train_pytorch.py datasets/sam/train_out datasets/sam/val_out db_resnet50 --epochs 5 --device 0
Namespace(train_path='datasets/sam/train_out', val_path='datasets/sam/val_out', arch='db_resnet50', name=None, epochs=5, batch_size=2, device=0, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=None, resume=None, test_only=False, freeze_backbone=False, show_samples=False, wb=False, push_to_hub=False, pretrained=False, rotation=False, eval_straight=False, sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.4528s (67 samples in 34 batches)
Train set loaded in 0.07876s (540 samples in 270 batches)

Training loss: 0.658518:  78%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                             | 211/270 [03:23<00:47,  1.24it/s]

EDIT: worked until killed:

(env-py3.10) incognito@DESKTOP-NHKR7QL:~/doctr$ python references/detection/train_pytorch.py datasets/sam/train_out datasets/sam/val_out db_resnet50 --epochs 5 --device 0
Namespace(train_path='datasets/sam/train_out', val_path='datasets/sam/val_out', arch='db_resnet50', name=None, epochs=5, batch_size=2, device=0, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=None, resume=None, test_only=False, freeze_backbone=False, show_samples=False, wb=False, push_to_hub=False, pretrained=False, rotation=False, eval_straight=False, sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.4528s (67 samples in 34 batches)
Train set loaded in 0.07876s (540 samples in 270 batches)
Training loss: 0.643471: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [03:52<00:00,  1.16it/s]100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:33<00:00,  1.00it/s]
Validation loss decreased inf --> 2.29916: saving state...
Epoch 1/5 - Validation loss: 2.29916 (Recall: 1.67% | Precision: 5.47% | Mean IoU: 9.00%)
  0%|                                                                                                                                                                                                                                                 | 0/270 [00:20<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                                    | 0/270 [00:00<?, ?it/s]
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 481, in <module>
    main(args)
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 388, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, amp=args.amp)
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 109, in fit_one_epoch
    for images, targets in pbar:
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1327, in _next_data
    idx, data = self._get_data()
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1283, in _get_data
    success, data = self._try_get_data()
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1131, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/usr/lib/python3.10/queue.py", line 180, in get
    self.not_empty.wait(remaining)
  File "/usr/lib/python3.10/threading.py", line 324, in wait
    gotit = waiter.acquire(True, timeout)
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/_utils/signal_handling.py", line 67, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 44881) is killed by signal: Killed.
felixdittrich92 commented 1 month ago

Images are resized internally :)

Try to reduce/set the workers with --workers=<INT_DEPENDING_ON_YOU_MACHINE>

johnlockejrr commented 1 month ago

I just resized the images to x960 and recalculated the the polygons and everything goes smooth, anyway my dataset is at line level, I give it a try :)

(env-py3.10) incognito@DESKTOP-NHKR7QL:~/doctr$ python references/detection/train_pytorch.py datasets/sam/train_out datasets/sam/val_out db_resnet50 --epochs 5 --device 0
Namespace(train_path='datasets/sam/train_out', val_path='datasets/sam/val_out', arch='db_resnet50', name=None, epochs=5, batch_size=2, device=0, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=None, resume=None, test_only=False, freeze_backbone=False, show_samples=False, wb=False, push_to_hub=False, pretrained=False, rotation=False, eval_straight=False, sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.1393s (67 samples in 34 batches)
Train set loaded in 0.07748s (540 samples in 270 batches)
Training loss: 1.3698: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:34<00:00,  2.86it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:13<00:00,  2.55it/s]
Validation loss decreased inf --> 0.674124: saving state...
Epoch 1/5 - Validation loss: 0.674124 (Recall: 4.78% | Precision: 3.38% | Mean IoU: 5.00%)
Training loss: 0.711258: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:28<00:00,  3.05it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:09<00:00,  3.48it/s]
Epoch 2/5 - Validation loss: 0.817873 (Recall: 5.47% | Precision: 2.30% | Mean IoU: 3.00%)
Training loss: 0.563128: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:26<00:00,  3.11it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:09<00:00,  3.52it/s]
Validation loss decreased 0.674124 --> 0.632917: saving state...
Epoch 3/5 - Validation loss: 0.632917 (Recall: 16.05% | Precision: 32.59% | Mean IoU: 29.00%)
Training loss: 0.610216: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:27<00:00,  3.07it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:09<00:00,  3.50it/s]
Epoch 4/5 - Validation loss: 0.642417 (Recall: 21.75% | Precision: 11.35% | Mean IoU: 9.00%)
Training loss: 0.604278: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:27<00:00,  3.09it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:09<00:00,  3.49it/s]
Validation loss decreased 0.632917 --> 0.565686: saving state...
Epoch 5/5 - Validation loss: 0.565686 (Recall: 43.27% | Precision: 46.25% | Mean IoU: 36.00%)
felixdittrich92 commented 1 month ago

You should train longer :D But for only 5 epochs the metrics doesn't looks wrong :+1:

johnlockejrr commented 1 month ago

Yes! I just wanted to be sure it runs, was a first test, I'm happy with it anyway.

Just figuring how to add my new trained (*ish) model to the streamlit demo app :-|

EDIT: besides my datasets are line-level, I have another problem: my datasets are mostly RTL, should I do anything for it to work (like python bidi etc.)? Is, let's say Arabic or Hebrew requiring other features?

felixdittrich92 commented 1 month ago

Yes! I just wanted to be sure it runs, was a first test, I'm happy with it anyway.

Just figuring how to add my new trained (*ish) model to the streamlit demo app :-|

Curious to see how well this can work ^^

Currently we use anyascii (https://github.com/anyascii/anyascii) i think this should work !? :)

johnlockejrr commented 1 month ago

Yes! I just wanted to be sure it runs, was a first test, I'm happy with it anyway. Just figuring how to add my new trained (*ish) model to the streamlit demo app :-|

Curious to see how well this can work ^^

Currently we use anyascii (https://github.com/anyascii/anyascii) i think this should work !? :)

Never used it, yes, I think it should.

johnlockejrr commented 1 month ago

Seems I can't load it as per https://mindee.github.io/doctr/using_doctr/custom_models_training.html :)

image

felixdittrich92 commented 1 month ago

You can :) You have to change the vocab with --vocab=.. See here for the predefined vocabs we have: https://github.com/mindee/doctr/blob/main/doctr/datasets/vocabs.py

The vocab should contain all the chars you have in your dataset (or more)

johnlockejrr commented 1 month ago

Oh, sorry, I'm new to it. I mostly trained kraken yolov8 and DocUFCN models.

But it needs a vocab for a detection model? I didn't train a recognition model yet.

felixdittrich92 commented 1 month ago

If no of the predefined vocabs should fit you can simply change:

https://github.com/mindee/doctr/blob/df762ed90010db4df9f4cb5692b52c2a2e5dc819/references/recognition/train_pytorch.py#L189

to vocab="abc" for example but to load the model later you need the same string which defines your models vocab :)

felixdittrich92 commented 1 month ago

@johnlockejrr No only for the recognition model training

johnlockejrr commented 1 month ago

Couldn't I load only the detection model to see how it performs on a new test image?

johnlockejrr commented 1 month ago

If no of the predefined vocabs should fit you can simply change:

https://github.com/mindee/doctr/blob/df762ed90010db4df9f4cb5692b52c2a2e5dc819/references/recognition/train_pytorch.py#L189

to vocab="abc" for example but to load the model later you need the same string which defines your models vocab :)

I just take a look at vocabs.py and for VOCABS["hebrew"] there are more characters, the file should be amended sometime in the future.

felixdittrich92 commented 1 month ago

If no of the predefined vocabs should fit you can simply change: https://github.com/mindee/doctr/blob/df762ed90010db4df9f4cb5692b52c2a2e5dc819/references/recognition/train_pytorch.py#L189

to vocab="abc" for example but to load the model later you need the same string which defines your models vocab :)

I just take a look at vocabs.py and for VOCABS["hebrew"] there are more characters, the file should be amended sometime in the future.

Feel free to open a PR to add the missing chars :+1:

felixdittrich92 commented 1 month ago

Can't load only the detection model to see how it performs on a new test image?

Sure :)

Load your custom trained model (in combination with the ocr_predictor):

# Load custom detection model
det_model = db_resnet50(pretrained=False, pretrained_backbone=False)
det_params = torch.load('<path_to_pt>', map_location="cpu")
det_model.load_state_dict(det_params)
predictor = ocr_predictor(det_arch=det_model, reco_arch="vitstr_small", pretrained=True)

or only with the detection_predictor:

import requests
import cv2
import numpy as np
import torch

from doctr.io import DocumentFile
from doctr.models import detection_predictor, db_resnet50
from doctr.utils.geometry import detach_scores

# Convert relative coordinates to absolute pixel values
def _to_absolute(geom, img_shape: tuple[int, int]) -> list[list[int]]:
    h, w = img_shape
    if len(geom) == 2:  # Assume straight pages = True -> [[xmin, ymin], [xmax, ymax]]
        (xmin, ymin), (xmax, ymax) = geom
        xmin, xmax = int(round(w * xmin)), int(round(w * xmax))
        ymin, ymax = int(round(h * ymin)), int(round(h * ymax))
        return [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]]
    else:  # For polygons, convert each point to absolute coordinates
        return [[int(point[0] * w), int(point[1] * h)] for point in geom]

url = "https://www.francetvinfo.fr/pictures/uGwaNE-aJq7zHLhZJdzdCd9nyjE/1200x900/2021/03/16/phpCDwGn0.jpg"

# Load custom detection model
det_model = db_resnet50(pretrained=False, pretrained_backbone=False)
det_params = torch.load('<path_to_pt>', map_location="cpu")
det_model.load_state_dict(det_params)

det_predictor = detection_predictor(
    arch=det_model,
    pretrained=False,
    assume_straight_pages=True,
    symmetric_pad=True,
    preserve_aspect_ratio=True,
) #.cuda().half()  # Uncomment this line if you have a GPU

det_predictor.model.postprocessor.bin_thresh = 0.3
det_predictor.model.postprocessor.box_thresh = 0.65

docs = DocumentFile.from_images([requests.get(url).content])
results = det_predictor(docs)

image = cv2.imdecode(np.frombuffer(requests.get(url).content, np.uint8), cv2.IMREAD_COLOR)

for doc, res in zip(docs, results):
    img_shape = (doc.shape[0], doc.shape[1])
    # Detach the probability scores from the results
    detached_coords, prob_scores = detach_scores([res.get("words")])

    for i, coords in enumerate(detached_coords[0]):
        coords = coords.reshape(2, 2).tolist() if coords.shape == (4, ) else coords.tolist()

        # Convert relative to absolute pixel coordinates
        points = np.array(_to_absolute(coords, img_shape), dtype=np.int32).reshape((-1, 1, 2))

        # Draw the bounding box on the image
        cv2.polylines(image, [points], isClosed=True, color=(255, 0, 0), thickness=2)

    # Save the modified image with bounding boxes
    cv2.imwrite("output.jpg", image)
johnlockejrr commented 1 month ago

Perfect! Thank you for all your help! I'll open a PR later today for a new language and ammend the Hebrew language.

felixdittrich92 commented 1 month ago

Perfect! Thank you for all your help! I'll open a PR later today for a new language and ammend the Hebrew language.

reference PR to show what's required to update or add a vocab: https://github.com/mindee/doctr/pull/1700/files

johnlockejrr commented 1 month ago

Very strage with my model. Executing your script above:

(env-py3.10) incognito@DESKTOP-NHKR7QL:~/doctr$ python load_det_model.py
/home/incognito/doctr/load_det_model.py:27: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  det_params = torch.load('db_resnet50_20240930-142637.pt', map_location="cpu")
Traceback (most recent call last):
  File "/home/incognito/doctr/load_det_model.py", line 28, in <module>
    det_model.load_state_dict(det_params)
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DBNet:
        size mismatch for prob_head.6.weight: copying a param with shape torch.Size([64, 2, 2, 2]) from checkpoint, the shape in current model is torch.Size([64, 1, 2, 2]).
        size mismatch for prob_head.6.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([1]).
        size mismatch for thresh_head.6.weight: copying a param with shape torch.Size([64, 2, 2, 2]) from checkpoint, the shape in current model is torch.Size([64, 1, 2, 2]).
        size mismatch for thresh_head.6.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([1]).

Could this happen because I trained it on a line-level dataset?

felixdittrich92 commented 1 month ago

Can your share on entry from your labels.json you used for training ?

johnlockejrr commented 1 month ago

Can your share on entry from your labels.json you used for training ?

Sure:

{"81_dc946_default.jpg": {"img_dimensions": [720, 960], "img_hash": "f04698acbbc7246475a8401dc031facf1d152c156cb1363217270cd7591e94d3", "polygons": {"textzone": [[[66, 153], [527, 153], [527, 709], [66, 709]]], "textline": [[[78, 161], [515, 161], [515, 188], [78, 188]], [[76, 180], [515, 180], [515, 207], [76, 207]], [[79, 201], [515, 201], [515, 229], [79, 229]], [[77, 221], [514, 221], [514, 250], [77, 250]], [[78, 242], [516, 242], [516, 273], [78, 273]], [[73, 264], [516, 264], [516, 292], [73, 292]], [[75, 287], [517, 287], [517, 313], [75, 313]], [[76, 307], [517, 307], [517, 335], [76, 335]], [[73, 327], [518, 327], [518, 356], [73, 356]], [[75, 350], [516, 350], [516, 377], [75, 377]], [[76, 388], [518, 388], [518, 417], [76, 417]], [[77, 412], [519, 412], [519, 437], [77, 437]], [[74, 434], [518, 434], [518, 457], [74, 457]], [[75, 452], [518, 452], [518, 478], [75, 478]], [[78, 472], [518, 472], [518, 499], [78, 499]], [[81, 493], [519, 493], [519, 519], [81, 519]], [[81, 514], [518, 514], [518, 540], [81, 540]], [[73, 535], [519, 535], [519, 560], [73, 560]], [[74, 556], [519, 556], [519, 581], [74, 581]], [[72, 576], [519, 576], [519, 602], [72, 602]], [[74, 596], [519, 596], [519, 624], [74, 624]], [[75, 618], [517, 618], [517, 647], [75, 647]], [[73, 637], [521, 637], [521, 666], [73, 666]], [[79, 658], [520, 658], [520, 686], [79, 686]], [[75, 680], [520, 680], [520, 714], [75, 714]]]}}, "136_7aab7_default.jpg": {"img_dimensions": [720, 960], "img_hash": "eac91c1193e188f4dd089705086e3e3dfd6bc5233d5ceb714c6082684a64ab06", "polygons": {"textzone": [[[183, 174], [621, 174], [621, 722], [183, 722]]], "textline": [[[188, 181], [615, 181], [615, 211], [188, 211]], [[187, 206], [614, 206], [614, 231], [187, 231]], [[184, 226], [613, 226], [613, 252], [184, 252]], [[188, 246], [614, 246], [614, 274], [188, 274]], [[188, 268], [615, 268], [615, 291], [188, 291]], [[189, 287], [615, 287], [615, 315], [189, 315]], [[188, 308], [614, 308], [614, 335], [188, 335]], [[188, 329], [616, 329], [616, 355], [188, 355]], [[187, 349], [616, 349], [616, 375], [187, 375]], [[186, 372], [616, 372], [616, 397], [186, 397]], [[186, 390], [616, 390], [616, 417], [186, 417]], [[188, 429], [618, 429], [618, 455], [188, 455]], [[189, 450], [619, 450], [619, 477], [189, 477]], [[189, 471], [619, 471], [619, 498], [189, 498]], [[189, 491], [619, 491], [619, 517], [189, 517]], [[190, 512], [618, 512], [618, 538], [190, 538]], [[190, 533], [620, 533], [620, 558], [190, 558]], [[189, 553], [619, 553], [619, 577], [189, 577]], [[192, 574], [616, 574], [616, 599], [192, 599]], [[191, 594], [620, 594], [620, 620], [191, 620]], [[191, 613], [619, 613], [619, 638], [191, 638]], [[193, 633], [619, 633], [619, 660], [193, 660]], [[190, 655], [620, 655], [620, 680], [190, 680]], [[189, 673], [619, 673], [619, 700], [189, 700]], [[186, 694], [618, 694], [618, 729], [186, 729]]]}},
...
johnlockejrr commented 1 month ago

Better, I can upload the labels.json of val because is smaller than train.

labels.json

felixdittrich92 commented 1 month ago

Ah i see you trained an KIE model :sweat_smile:

To train only a detection model polygons shouldn't be a dict -- only the polygons as value like.

"polygons": [[[66, 153], [527, 153], [527, 709], [66, 709]], .....]
johnlockejrr commented 1 month ago

OMG! :)

felixdittrich92 commented 1 month ago

OMG! :)

I think this wasn't planned right ? ^^

johnlockejrr commented 1 month ago

For a detection model can't I specify more class names? As I have textzones and textlines Or better I just remove the textzone class and keep the textlines?

felixdittrich92 commented 1 month ago

For a detection model can't I specify more class names? As I have textzones and textlines

You can also load this model with:

det_model = db_resnet50(pretrained=False, pretrained_backbone=False, class_names=['textzone', 'textline'])
det_params = torch.load('<path_to_pt>', map_location="cpu")
det_model.load_state_dict(det_params)
johnlockejrr commented 1 month ago

For a detection model can't I specify more class names? As I have textzones and textlines

You can also load this model with:

det_model = db_resnet50(pretrained=False, pretrained_backbone=False, class_names=['textzone', 'textline'])
det_params = torch.load('<path_to_pt>', map_location="cpu")
det_model.load_state_dict(det_params)

Bad day :)

/home/incognito/doctr/load_det_model-kie.py:28: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  det_params = torch.load('db_resnet50_20240930-142637.pt', map_location="cpu")
Traceback (most recent call last):
  File "/home/incognito/doctr/load_det_model-kie.py", line 50, in <module>
    detached_coords, prob_scores = detach_scores([res.get("words")])
  File "/home/incognito/doctr/doctr/utils/geometry.py", line 79, in detach_scores
    loc_preds, obj_scores = zip(*(_detach(box) for box in boxes))
  File "/home/incognito/doctr/doctr/utils/geometry.py", line 79, in <genexpr>
    loc_preds, obj_scores = zip(*(_detach(box) for box in boxes))
  File "/home/incognito/doctr/doctr/utils/geometry.py", line 75, in _detach
    if boxes.ndim == 2:
AttributeError: 'NoneType' object has no attribute 'ndim'
johnlockejrr commented 1 month ago

I think I should re-train it :)

Error on line detached_coords, prob_scores = detach_scores([res.get("words")])

If is a KIE model shouldn't I from doctr.models import kie_predictor?

I changed the line to detached_coords, prob_scores = detach_scores([res.get("textline")])

But I get nothing, script runs but no detections.

detached_coords -> [array([], shape=(0, 4), dtype=float32)]

johnlockejrr commented 1 month ago

I reconverted my data to:

{"215_67426_default.jpg": {"img_dimensions": [720, 960], "img_hash": "f4da2a0dcdcd28dbc08609bac090f465ee5d7b471fa42024da0a11e79acade60", "polygons": [[[72, 162], [514, 162], [514, 194], [72, 194]], [[69, 188], [514, 188], [514, 216], [69, 216]], [[69, 209], [514, 209], [514, 238], [69, 238]], [[69, 231], [514, 231], [514, 259], [69, 259]], [[69, 251], [514, 251], [514, 283], [69, 283]], [[70, 274], [515, 274], [515, 299], [70, 299]], [[70, 293], [515, 293], [515, 322], [70, 322]], [[69, 314], [516, 314], [516, 340], [69, 340]], [[69, 335], [516, 335], [516, 364], [69, 364]], [[67, 355], [516, 355], [516, 386], [67, 386]], [[69, 392], [517, 392], [517, 427], [69, 427]], [[70, 420], [514, 420], [514, 447], [70, 447]], [[70, 441], [517, 441], [517, 468], [70, 468]], [[70, 462], [517, 462], [517, 493], [70, 493]], [[70, 483], [518, 483], [518, 511], [70, 511]], [[77, 504], [519, 504], [519, 534], [77, 534]], [[65, 526], [520, 526], [520, 555], [65, 555]], [[69, 547], [519, 547], [519, 578], [69, 578]], [[69, 570], [521, 570], [521, 598], [69, 598]], [[71, 590], [520, 590], [520, 619], [71, 619]], [[65, 612], [521, 612], [521, 642], [65, 642]], [[70, 635], [521, 635], [521, 663], [70, 663]], [[70, 660], [522, 660], [522, 684], [70, 684]], [[66, 677], [522, 677], [522, 703], [66, 703]], [[70, 698], [522, 698], [522, 727], [70, 727]], [[67, 716], [199, 716], [199, 741], [67, 741]]]}, "545_4408b_default.jpg": {"img_dimensions": [720, 960], "img_hash": "21c0f7326a7821b77b2a5e49e76017e60555dd40670005863a20a13d2803748d", "polygons": [[[107, 179], [507, 179], [507, 207], [107, 207]], [[107, 200], [510, 200], [510, 226], [107, 226]], [[105, 220], [509, 220], [509, 245], [105, 245]], [[109, 243], [510, 243], [510, 262], [109, 262]], [[106, 259], [510, 259], [510, 282], [106, 282]], [[106, 277], [510, 277], [510, 301], [106, 301]], [[106, 299], [510, 299], [510, 319], [106, 319]], [[103, 315], [510, 315], [510, 338], [103, 338]], [[103, 333], [510, 333], [510, 358], [103, 358]], [[101, 354], [510, 354], [510, 379], [101, 379]], [[104, 373], [509, 373], [509, 398], [104, 398]], [[101, 390], [510, 390], [510, 416], [101, 416]], [[103, 412], [511, 412], [511, 431], [103, 431]], [[104, 430], [511, 430], [511, 455], [104, 455]], [[101, 450], [510, 450], [510, 475], [101, 475]], [[104, 469], [510, 469], [510, 495], [104, 495]], [[104, 489], [509, 489], [509, 514], [104, 514]], [[104, 507], [510, 507], [510, 533], [104, 533]], [[104, 528], [510, 528], [510, 553], [104, 553]], [[103, 549], [511, 549], [511, 572], [103, 572]], [[103, 565], [509, 565], [509, 591], [103, 591]], [[103, 584], [511, 584], [511, 611], [103, 611]], [[101, 602], [511, 602], [511, 629], [101, 629]], [[99, 622], [512, 622], [512, 650], [99, 650]], [[105, 660], [512, 660], [512, 693], [105, 693]], [[103, 684], [202, 684], [202, 710], [103, 710]]]},

I'll retrain :)

johnlockejrr commented 1 month ago
(env-py3.10) incognito@DESKTOP-NHKR7QL:~/doctr$ python references/detection/train_pytorch.py datasets/sam/train_out datasets/sam/val_out db_resnet50 --epochs 10 --device 0
Namespace(train_path='datasets/sam/train_out', val_path='datasets/sam/val_out', arch='db_resnet50', name=None, epochs=10, batch_size=2, device=0, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=None, resume=None, test_only=False, freeze_backbone=False, show_samples=False, wb=False, push_to_hub=False, pretrained=False, rotation=False, eval_straight=False, sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.1427s (67 samples in 34 batches)
Train set loaded in 0.0208s (540 samples in 270 batches)
Training loss: 0.29681: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:06<00:00,  4.07it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:09<00:00,  3.59it/s]
Validation loss decreased inf --> 0.362736: saving state...
Epoch 1/10 - Validation loss: 0.362736 (Recall: 98.02% | Precision: 85.08% | Mean IoU: 65.00%)
Training loss: 0.321628: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.29it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.49it/s]
Epoch 2/10 - Validation loss: 0.372804 (Recall: 95.15% | Precision: 84.16% | Mean IoU: 63.00%)
Training loss: 0.406969: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:03<00:00,  4.24it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.43it/s]
Validation loss decreased 0.362736 --> 0.33441: saving state...
Epoch 3/10 - Validation loss: 0.33441 (Recall: 92.34% | Precision: 75.74% | Mean IoU: 52.00%)
Training loss: 0.508775: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.29it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.54it/s]
Epoch 4/10 - Validation loss: 0.354248 (Recall: 98.68% | Precision: 80.43% | Mean IoU: 64.00%)
Training loss: 0.389871: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:03<00:00,  4.28it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.54it/s]
Validation loss decreased 0.33441 --> 0.316777: saving state...
Epoch 5/10 - Validation loss: 0.316777 (Recall: 98.68% | Precision: 89.18% | Mean IoU: 70.00%)
Training loss: 0.36966: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.30it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.60it/s]
Validation loss decreased 0.316777 --> 0.308347: saving state...
Epoch 6/10 - Validation loss: 0.308347 (Recall: 97.19% | Precision: 81.19% | Mean IoU: 59.00%)
Training loss: 0.31847: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:03<00:00,  4.25it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.49it/s]
Validation loss decreased 0.308347 --> 0.285198: saving state...
Epoch 7/10 - Validation loss: 0.285198 (Recall: 98.08% | Precision: 87.41% | Mean IoU: 67.00%)
Training loss: 0.202373:  11%|███████████████████████▊                                                                                                                                                                                       | 31/270 [00:08<01:05,  3.67it/s]
Traceback (most recent call last):███████████████████▊                                                                                                                                                                                       | 31/270 [00:08<00:52,  4.53it/s]
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 481, in <module>
    main(args)
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 388, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, amp=args.amp)
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 109, in fit_one_epoch
    for images, targets in pbar:
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1324, in _next_data
    return self._process_data(data)
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
    data.reraise()
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/_utils.py", line 706, in reraise
    raise exception
UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 15.
Original Traceback (most recent call last):
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/incognito/doctr/doctr/datasets/datasets/base.py", line 67, in __getitem__
    img_transformed, target[class_name] = self.sample_transforms(img, bboxes)
  File "/home/incognito/doctr/doctr/transforms/modules/base.py", line 56, in __call__
    x, target = t(x, target)
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/incognito/doctr/doctr/transforms/modules/pytorch.py", line 87, in forward
    target[:, [0, 2]] = offset[0] + target[:, [0, 2]] * raw_shape[-1] / img.shape[-1]
UnboundLocalError: local variable 'offset' referenced before assignment

I resumed it and it finished:

(env-py3.10) incognito@DESKTOP-NHKR7QL:~/doctr$ python references/detection/train_pytorch.py datasets/sam/train_out datasets/sam/val_out db_resnet50 --epochs 5 --device 0 --resume ./db_resnet50_20240930-162432.pt --workers 2
Namespace(train_path='datasets/sam/train_out', val_path='datasets/sam/val_out', arch='db_resnet50', name=None, epochs=5, batch_size=2, device=0, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=2, resume='./db_resnet50_20240930-162432.pt', test_only=False, freeze_backbone=False, show_samples=False, wb=False, push_to_hub=False, pretrained=False, rotation=False, eval_straight=False, sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.1605s (67 samples in 34 batches)
Resuming ./db_resnet50_20240930-162432.pt
/home/incognito/doctr/references/detection/train_pytorch.py:228: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(args.resume, map_location="cpu")
Train set loaded in 0.07673s (540 samples in 270 batches)
Training loss: 0.342384: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:04<00:00,  4.20it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:08<00:00,  4.06it/s]
Validation loss decreased inf --> 0.333333: saving state...
Epoch 1/5 - Validation loss: 0.333333 (Recall: 98.32% | Precision: 84.99% | Mean IoU: 64.00%)
Training loss: 0.285108: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.35it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.64it/s]
Validation loss decreased 0.333333 --> 0.298129: saving state...
Epoch 2/5 - Validation loss: 0.298129 (Recall: 97.84% | Precision: 90.08% | Mean IoU: 67.00%)
Training loss: 0.241384: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:01<00:00,  4.40it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.66it/s]
Validation loss decreased 0.298129 --> 0.234458: saving state...
Epoch 3/5 - Validation loss: 0.234458 (Recall: 98.80% | Precision: 81.85% | Mean IoU: 71.00%)
Training loss: 0.238148: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:01<00:00,  4.37it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.72it/s]
Epoch 4/5 - Validation loss: 0.238532 (Recall: 98.50% | Precision: 86.95% | Mean IoU: 75.00%)
Training loss: 0.237705: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.34it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.62it/s]
Validation loss decreased 0.234458 --> 0.20468: saving state...
Epoch 5/5 - Validation loss: 0.20468 (Recall: 98.98% | Precision: 89.64% | Mean IoU: 80.00%)
felixdittrich92 commented 1 month ago
(env-py3.10) incognito@DESKTOP-NHKR7QL:~/doctr$ python references/detection/train_pytorch.py datasets/sam/train_out datasets/sam/val_out db_resnet50 --epochs 10 --device 0
Namespace(train_path='datasets/sam/train_out', val_path='datasets/sam/val_out', arch='db_resnet50', name=None, epochs=10, batch_size=2, device=0, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=None, resume=None, test_only=False, freeze_backbone=False, show_samples=False, wb=False, push_to_hub=False, pretrained=False, rotation=False, eval_straight=False, sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.1427s (67 samples in 34 batches)
Train set loaded in 0.0208s (540 samples in 270 batches)
Training loss: 0.29681: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:06<00:00,  4.07it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:09<00:00,  3.59it/s]
Validation loss decreased inf --> 0.362736: saving state...
Epoch 1/10 - Validation loss: 0.362736 (Recall: 98.02% | Precision: 85.08% | Mean IoU: 65.00%)
Training loss: 0.321628: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.29it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.49it/s]
Epoch 2/10 - Validation loss: 0.372804 (Recall: 95.15% | Precision: 84.16% | Mean IoU: 63.00%)
Training loss: 0.406969: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:03<00:00,  4.24it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.43it/s]
Validation loss decreased 0.362736 --> 0.33441: saving state...
Epoch 3/10 - Validation loss: 0.33441 (Recall: 92.34% | Precision: 75.74% | Mean IoU: 52.00%)
Training loss: 0.508775: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.29it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.54it/s]
Epoch 4/10 - Validation loss: 0.354248 (Recall: 98.68% | Precision: 80.43% | Mean IoU: 64.00%)
Training loss: 0.389871: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:03<00:00,  4.28it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.54it/s]
Validation loss decreased 0.33441 --> 0.316777: saving state...
Epoch 5/10 - Validation loss: 0.316777 (Recall: 98.68% | Precision: 89.18% | Mean IoU: 70.00%)
Training loss: 0.36966: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.30it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.60it/s]
Validation loss decreased 0.316777 --> 0.308347: saving state...
Epoch 6/10 - Validation loss: 0.308347 (Recall: 97.19% | Precision: 81.19% | Mean IoU: 59.00%)
Training loss: 0.31847: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:03<00:00,  4.25it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00,  5.49it/s]
Validation loss decreased 0.308347 --> 0.285198: saving state...
Epoch 7/10 - Validation loss: 0.285198 (Recall: 98.08% | Precision: 87.41% | Mean IoU: 67.00%)
Training loss: 0.202373:  11%|███████████████████████▊                                                                                                                                                                                       | 31/270 [00:08<01:05,  3.67it/s]
Traceback (most recent call last):███████████████████▊                                                                                                                                                                                       | 31/270 [00:08<00:52,  4.53it/s]
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 481, in <module>
    main(args)
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 388, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, amp=args.amp)
  File "/home/incognito/doctr/references/detection/train_pytorch.py", line 109, in fit_one_epoch
    for images, targets in pbar:
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1324, in _next_data
    return self._process_data(data)
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
    data.reraise()
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/_utils.py", line 706, in reraise
    raise exception
UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 15.
Original Traceback (most recent call last):
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/incognito/doctr/doctr/datasets/datasets/base.py", line 67, in __getitem__
    img_transformed, target[class_name] = self.sample_transforms(img, bboxes)
  File "/home/incognito/doctr/doctr/transforms/modules/base.py", line 56, in __call__
    x, target = t(x, target)
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/incognito/doctr/env-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/incognito/doctr/doctr/transforms/modules/pytorch.py", line 87, in forward
    target[:, [0, 2]] = offset[0] + target[:, [0, 2]] * raw_shape[-1] / img.shape[-1]
UnboundLocalError: local variable 'offset' referenced before assignment

I resumed it and it finished:

(env-py3.10) incognito@DESKTOP-NHKR7QL:~/doctr$ python references/detection/train_pytorch.py datasets/sam/train_out datasets/sam/val_out db_resnet50 --epochs 5 --device 0 --resume ./db_resnet50_20240930-162432.pt --workers 2
Namespace(train_path='datasets/sam/train_out', val_path='datasets/sam/val_out', arch='db_resnet50', name=None, epochs=5, batch_size=2, device=0, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=2, resume='./db_resnet50_20240930-162432.pt', test_only=False, freeze_backbone=False, show_samples=False, wb=False, push_to_hub=False, pretrained=False, rotation=False, eval_straight=False, sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.1605s (67 samples in 34 batches)
Resuming ./db_resnet50_20240930-162432.pt
/home/incognito/doctr/references/detection/train_pytorch.py:228: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(args.resume, map_location="cpu")
Train set loaded in 0.07673s (540 samples in 270 batches)
Training loss: 0.342384: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:04<00:00,  4.20it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:08<00:00,  4.06it/s]
Validation loss decreased inf --> 0.333333: saving state...
Epoch 1/5 - Validation loss: 0.333333 (Recall: 98.32% | Precision: 84.99% | Mean IoU: 64.00%)
Training loss: 0.285108: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.35it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.64it/s]
Validation loss decreased 0.333333 --> 0.298129: saving state...
Epoch 2/5 - Validation loss: 0.298129 (Recall: 97.84% | Precision: 90.08% | Mean IoU: 67.00%)
Training loss: 0.241384: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:01<00:00,  4.40it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.66it/s]
Validation loss decreased 0.298129 --> 0.234458: saving state...
Epoch 3/5 - Validation loss: 0.234458 (Recall: 98.80% | Precision: 81.85% | Mean IoU: 71.00%)
Training loss: 0.238148: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:01<00:00,  4.37it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.72it/s]
Epoch 4/5 - Validation loss: 0.238532 (Recall: 98.50% | Precision: 86.95% | Mean IoU: 75.00%)
Training loss: 0.237705: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [01:02<00:00,  4.34it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.62it/s]
Validation loss decreased 0.234458 --> 0.20468: saving state...
Epoch 5/5 - Validation loss: 0.20468 (Recall: 98.98% | Precision: 89.64% | Mean IoU: 80.00%)

That's a known issue PR to fix this is on the way :) https://github.com/mindee/doctr/pull/1715 CC @odulcy-mindee

johnlockejrr commented 1 month ago

It performs well (*ish). With your script above but any idea why identifies only one line?

output

felixdittrich92 commented 1 month ago

It performs well (*ish). With your script above but any idea why identifies only one line?

output

What's the shape of the model output?

felixdittrich92 commented 1 month ago

Btw in my provided script lower bin_thresh and box_thresh to 0.1

johnlockejrr commented 1 month ago

I trained the model on x960 images, when detecting I sould use the same resolution?

felixdittrich92 commented 1 month ago

I trained the model on x960 images, when detecting I sould use the same resolution?

If you have resized it before on your own it would make sense yep

johnlockejrr commented 1 month ago

I resized the image to x960. I think it needs more training.

output