mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.31k stars 397 forks source link

Recognition training errors #1645

Closed ArsalanYounus007 closed 3 weeks ago

ArsalanYounus007 commented 3 weeks ago

Bug description

Hi 👋 I am working on a project to train a model on Handwriting recognition. I have a mix of IAM and Custom (in house) dataset. It contains Words and Sentences both (I think that is the issue)

I have tried parseq and crnn_vgg16_bn and both get different errors. I updated Vocab.py and added space in the string. But I think that is probably not the correct way

I am interested in trying master, parseq, vistr_base.

Code snippet to reproduce the bug

For CRNN

python references/recognition/train_tensorflow.py crnn_vgg16_bn --train_path ./handwriting_dataset --val_path ./handwriting_dataset_val --epochs 5

For parseq

python references/recognition/train_tensorflow.py parseq --train_path ./handwriting_dataset --val_path ./handwriting_dataset_val --epochs 5

Error traceback

For Parseq

Train set loaded in 20.9s (125801 samples in 1965 batches)
  0%|                                                                                                                                                                                                                                                                                                                     | 0/1965 [00:06<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                                                                                                        | 0/1965 [00:00<?, ?it/s]
  File "references/recognition/train_tensorflow.py", line 448, in <module>
    main(args)
  File "references/recognition/train_tensorflow.py", line 346, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, args.amp)
  File "references/recognition/train_tensorflow.py", line 95, in fit_one_epoch
    train_loss = model(images, targets, training=True)["loss"]
  File "C:\Users\Arsalan\anaconda3\envs\doctr_recognition_training\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "F:\GitHub\doctr\doctr\models\recognition\parseq\tensorflow.py", line 361, in call
    mask = tf.logical_and(padding_mask, tf.expand_dims(tf.expand_dims(target_mask, axis=0), axis=0))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "par_seq" (type PARSeq).

required broadcastable shapes [Op:LogicalAnd]

Call arguments received:
  • x=tf.Tensor(shape=(64, 32, 128, 3), dtype=float32)
  • target=["'cent.'", "'of'", "'6806065232088'", "'It'", "'Riverview Abbey F.H.'", "'frame'", "'751-East 75th St.'", "'you'", "'the'", "'12-17-18'", "'about'", "'3'", "'Sore her ankle, Monday the 10th tried to work and over extended - over used'", "'03/17/2005'", "'lifting weights in school class and injured back'", "'J'", "'Grand Junction'", "'2'", "'Wiscasset'", "'MD'", "'Pension'", "'130.00'", "'could'", '\'Laceration >2" Rt Hand with utility knife\'', "'294'", "'down'", "'Parties'", "'8'", "'by'", "'protest'", "'03-11-1983'", "'2778 Country Club Dr.'", "'2.2,5'", "'author'", "','", "'Jeffrey Datterer'", "'(303) 771-6858'", "'78'", "'OME FEE'", "'the'", "'JoAnn AniEmEKa'", "'Family Practice/Emergency Medicine'", "'Sconrad@eckman'", "'but'", "'17/08/19'", "'NE'", "'5,879'", "'of'", "'of R lumbar'", "'NC'", "'eerie'", "'and'", "'9340 59 st'", "'said'", "'I'", "'250,000'", "','", "'spencer GAllAGHer'", "'654-3442'", "'the'", "'PPS Enhanced Yield'", "'08-16-86'", "'12-20-18'", "'Sleep'"]
  • return_model_output=False
  • return_preds=False
  • kwargs={'training': 'True'}

For CRNN

Train set loaded in 18.35s (125801 samples in 1965 batches)
  0%|                                                                                                                                                                                                                                                                                                                     | 0/1965 [00:01<?, ?it/s]
  File "references/recognition/train_tensorflow.py", line 448, in <module>
    main(args)
  File "references/recognition/train_tensorflow.py", line 346, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, args.amp)
  File "references/recognition/train_tensorflow.py", line 91, in fit_one_epoch
    for images, targets in pbar:
  File "C:\Users\Arsalan\anaconda3\envs\doctr_recognition_training\lib\site-packages\tqdm\std.py", line 1182, in __iter__
    for obj in iterable:
  File "F:\GitHub\doctr\doctr\datasets\loader.py", line 95, in __next__
    samples = list(multithread_exec(self.dataset.__getitem__, indices, threads=self.num_workers))
  File "F:\GitHub\doctr\doctr\utils\multithreading.py", line 49, in multithread_exec
    results = map(lambda x: x, tp.map(func, seq))  # noqa: C417
  File "C:\Users\Arsalan\anaconda3\envs\doctr_recognition_training\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\Arsalan\anaconda3\envs\doctr_recognition_training\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
  File "C:\Users\Arsalan\anaconda3\envs\doctr_recognition_training\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\Arsalan\anaconda3\envs\doctr_recognition_training\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "F:\GitHub\doctr\doctr\datasets\datasets\base.py", line 49, in __getitem__
    img, target = self._read_sample(index)
  File "F:\GitHub\doctr\doctr\datasets\datasets\tensorflow.py", line 37, in _read_sample
    assert isinstance(target, str) or isinstance(
AssertionError: Target should be a string or a numpy array
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-26.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-26.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-26.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-26.bias

Environment

Conda env

Deep Learning backend

is_tf_available: True is_torch_available: False