solivr / tf-crnn

TensorFlow convolutional recurrent neural network (CRNN) for text recognition
GNU General Public License v3.0
292 stars 98 forks source link

error when I run the code with Symbols #8

Closed WenmuZhou closed 7 years ago

WenmuZhou commented 7 years ago

the Symbols is

Symbols = "'.,:;-_=()[]{}/°"
BLANK_SYMBOL = '$'

when I run the code I meet the error

Traceback (most recent call last):
  File "train.py", line 99, in <module>
    image_summaries=True))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 241, in train
    loss = self._train_model(input_fn=input_fn, hooks=hooks)
           │                          │               └ []
           │                          └ <function data_loader.<locals>.input_fn at 0x7f20a01fbf28>
           └ <tensorflow.python.estimator.estimator.Estimator object at 0x7f20a01fc908>
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 686, in _train_model
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
       │      │             │                        └ EstimatorSpec(predictions={'raw_predictions': <tf.Tensor 'deep_bidirectional_lstm/raw_prediction:0' shape=(64, 75) dtype=int64>,...
       │      │             └ EstimatorSpec(predictions={'raw_predictions': <tf.Tensor 'deep_bidirectional_lstm/raw_prediction:0' shape=(64, 75) dtype=int64>,...
       │      └ <tensorflow.python.training.monitored_session.MonitoredSession object at 0x7f209b0322e8>
       └ None
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 518, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 862, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 818, in run
    return self._sess.run(*args, **kwargs)
           │               │       └ {'options': None, 'feed_dict': None, 'run_metadata': None}
           │               └ ([<tf.Operation 'group_deps' type=NoOp>, <tf.Tensor 'Print:0' shape=() dtype=float32>],)
           └ <tensorflow.python.training.monitored_session._CoordinatedSession object at 0x7f209acf6390>
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 972, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 818, in run
    return self._sess.run(*args, **kwargs)
           │               │       └ {'options': , 'fetches': {'caller': [<tf.Operation 'group_deps' type=NoOp>, <tf.Tensor 'Print:0' shape=() dtype=float32>], <tens...
           │               └ ()
           └ <tensorflow.python.training.monitored_session._HookedSession object at 0x7f209acf62e8>
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
                  │         │   └ 'All labels must be nonnegative integers, batch: 0 labels: 12,4,14,0,-1,-1,10\n\t [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=t...
                  │         └ <tf.Operation 'CTCLoss' type=CTCLoss>
                  └ name: "CTCLoss"
op: "CTCLoss"
input: "deep_bidirectional_lstm/transpose_time_major"
input: "str2code_conversion/StringSplit"
inp...
tensorflow.python.framework.errors_impl.InvalidArgumentError: All labels must be nonnegative integers, batch: 0 labels: 12,4,14,0,-1,-1,10
         [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](deep_bidirectional_lstm/transpose_time_major/_511, str2code_conversion/StringSplit, str2code_conversion/hash_table_Lookup, Cast_3/_589)]]
         [[Node: code2str_conversion/chars_conversion/Shape/_531 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_1842_code2str_conversion/chars_conversion/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'CTCLoss', defined at:
  File "train.py", line 99, in <module>
    image_summaries=True))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 241, in train
    loss = self._train_model(input_fn=input_fn, hooks=hooks)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 630, in _train_model
    model_fn_lib.ModeKeys.TRAIN)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 615, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/data/zhoujun/tf-crnn/src/model.py", line 272, in crnn_fn
    time_major=True)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/ctc_ops.py", line 152, in ctc_loss
    ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_ctc_ops.py", line 168, in _ctc_loss
    name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): All labels must be nonnegative integers, batch: 0 labels: 12,4,14,0,-1,-1,10
         [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](deep_bidirectional_lstm/transpose_time_major/_511, str2code_conversion/StringSplit, str2code_conversion/hash_table_Lookup, Cast_3/_589)]]
         [[Node: code2str_conversion/chars_conversion/Shape/_531 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_1842_code2str_conversion/chars_conversion/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

and the image i used is like this image

solivr commented 7 years ago

All labels must be nonnegative integers, batch: 0 labels: 12,4,14,0,-1,-1,10means that your lookup table isn't able to find a proper symbol to decode (the -1). The problem comes from this line, since the current implementation of string_split doesn't take utf8 format into account. In the Symbols list, the symbol ° has b'\xc2\xb0' utf8-encoding and this is treated as 2 symbols when split with string_split (and then your lookup table doesn't have any entry for °). I forgot to update it on Github but the solution is simply to remove ° from the list of symbols and use only characters that don't have fancy encodings in 'utf8' format.

WenmuZhou commented 7 years ago

when I use Chinese characters to train the model, the error is All labels must be nonnegative integers, batch: 0 labels: -1,-1,-1,-1,-1,-1,-1, so the reason is also because of this function, so ,how can I fixed this problem to train with Chinese characters

WenmuZhou commented 7 years ago

I have tried to add a $ after each character in labels except for last character,eg. ($=$/$- , and set the string_split line like this

splited = tf.string_split(labels, delimiter='$')

but there are some error, It likes the function string_split do not work

Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 16 labels: 13
2017-11-10 22:15:45.035575: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 16 labels: 13
         [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](deep_bidirectional_lstm/transpose_time_major/_511, str2code_conversion/StringSplit, str2code_conversion/hash_table_Lookup, Cast_3/_589)]]

Traceback (most recent call last):
  File "train.py", line 99, in <module>
    image_summaries=True))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 241, in train
    loss = self._train_model(input_fn=input_fn, hooks=hooks)
           │                          │               └ []
           │                          └ <function data_loader.<locals>.input_fn at 0x7f6df54bdf28>
           └ <tensorflow.python.estimator.estimator.Estimator object at 0x7f6df54be908>
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 686, in _train_model
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
       │      │             │                        └ EstimatorSpec(predictions={'words': <tf.Tensor 'code2str_conversion/chars_conversion/cond/Merge:0' shape=(?,) dtype=string>, 'ra...
       │      │             └ EstimatorSpec(predictions={'words': <tf.Tensor 'code2str_conversion/chars_conversion/cond/Merge:0' shape=(?,) dtype=string>, 'ra...
       │      └ <tensorflow.python.training.monitored_session.MonitoredSession object at 0x7f6df43042e8>
       └ None
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 518, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 862, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 818, in run
    return self._sess.run(*args, **kwargs)
           │               │       └ {'run_metadata': None, 'options': None, 'feed_dict': None}
           │               └ ([<tf.Operation 'group_deps' type=NoOp>, <tf.Tensor 'Print:0' shape=() dtype=float32>],)
           └ <tensorflow.python.training.monitored_session._CoordinatedSession object at 0x7f6deffbc390>
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 972, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 818, in run
    return self._sess.run(*args, **kwargs)
           │               │       └ {'run_metadata': , 'options': , 'feed_dict': None, 'fetches': {'caller': [<tf.Operation 'group_deps' type=NoOp>, <tf.Tensor 'Pri...
           │               └ ()
           └ <tensorflow.python.training.monitored_session._HookedSession object at 0x7f6deffbc2e8>
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
                  │         │   └ 'Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 16 labels: 13\n\t [[Node: CTCLoss...
                  │         └ <tf.Operation 'CTCLoss' type=CTCLoss>
                  └ name: "CTCLoss"
op: "CTCLoss"
input: "deep_bidirectional_lstm/transpose_time_major"
input: "str2code_conversion/StringSplit"
inp...
tensorflow.python.framework.errors_impl.InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 16 labels: 13
         [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](deep_bidirectional_lstm/transpose_time_major/_511, str2code_conversion/StringSplit, str2code_conversion/hash_table_Lookup, Cast_3/_589)]]
         [[Node: code2str_conversion/chars_conversion/Shape/_531 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_1842_code2str_conversion/chars_conversion/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'CTCLoss', defined at:
  File "train.py", line 99, in <module>
    image_summaries=True))
WenmuZhou commented 7 years ago

I have solved this problem. When making datasets, use the Chinese characters in the Alphabet index to represent the label, the index of each Chinese character is separated by a special symbol, for example, '$'. eg. image

After that , I change the code

with tf.name_scope('str2code_conversion'):
            table_str2int = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(keys, values), -1)
            splited = tf.string_split(labels, delimiter='')  # TODO change string split to utf8 split in next tf version
            codes = table_str2int.lookup(splited.values)
            sparse_code_target = tf.SparseTensor(splited.indices, codes, splited.dense_shape)

to

with tf.name_scope('str2code_conversion'):
            splited = tf.string_split(labels, delimiter='$')  # TODO change string split to utf8 split in next tf version
            sparse_code_target = tf.SparseTensor(splited.indices, tf.cast(tf.string_to_number(splited.values),tf.int32), splited.dense_shape)

finally, the code is work

meaatef commented 6 years ago

thank you for the reply this resolved the previous error by making dataset according WenmuZhou's method But now i am facing following error Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 80 labels: 40,39,42,32,70

codecolony commented 6 years ago

Hi @meaatef , @WenmuZhou , @solivr ,

I need help on this issue as well. My symbol list includes Symbols = " .,:;-_=()[]{}/%<>"

I get the same error. Do I need to use escaping characters in the labels file? A sample entry of this in the file looks like the following.

Image10260.jpg;\)nyenpc\>
Image27765.jpg;QZYN\<

I tried @WenmuZhou's fix but still doesn't work for me. Any insights on this one?

Thanks,

meatif commented 6 years ago

@codecolony Map your each label to a number as WenmuZhou did . it worked for me now i'm facing only problem that loss is not converging

cipri-tom commented 6 years ago

@codecolony you don't need escaping, but you need that your labels contain only letters from the alphabet, and that each letter from the alphabet can be mapped to a single number. WenmuZhou's trick is quite neat for that because it handles multi-byte characters.

Alternatively, you can use a different encoding.

Finally, I am using the following code to check the input labels:

    # put this in train.py , after parsing parameters

    # check input had conforming alphabet
    params_alphabet = set(parameters.alphabet)
    input_alphabet = set()
    for filename in parameters.csv_files_train + parameters.csv_files_eval:
        with open(filename, encoding='latin1') as file: # I use latin1 encoding in order to deal with éèç etc
            for line in file:
                input_alphabet.update(line.split(parameters.csv_delimiter, maxsplit=1)[1])
        for sep in '\n\r':
            input_alphabet.discard(sep)
        extra_chars = input_alphabet - params_alphabet
        assert len(extra_chars) == 0, 'Invalid char %s in file %s' % (extra_chars, filename)

    model_params = {
        'Params': parameters,
    }