steps_per_epoch根据训练集的不同需要修改吗?

AnMoran commented 4 years ago

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

alexchungio commented 3 years ago

源码中使用的三个数据集总的样本数为396733，配置里step_per_peoch=500, gpus=4, batch_size=10，这样算每个epoch 的可训练的样本数=500 4 10 =20000，这样的话一个epoch是无法遍历整个数据集的，我这里也有困惑。

whereitogo commented 3 years ago

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

你好，请问，这个最终的训练结果怎么样？我像试一试作者提供的pb模型，但不知道怎么从docker取文件，可以发我一份吗？我这里训练太慢了，一个epoch要30分钟，不知道为啥！

xianzhe-741 commented 3 years ago

你好，我使用过程中有两个问题请教一下：

test.py过程中使用作者docker中的模型text_recognition5435.pb，在 = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33] 2.在train.py时报错 File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init assert_type(model, ModelDescBase, 'model') File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type name, tp.name, v.class.name) AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

AnMoran commented 3 years ago

你好，我使用过程中有两个问题请教一下：

test.py过程中使用作者docker中的模型text_recognition5435.pb，在 = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33] 2.在train.py时报错 File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init assert_type(model, ModelDescBase, 'model') File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type name, tp.name, v.class.name) AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

1.我没用过作者的docker，我是直接按照这个需求配置的本地虚拟环境，也没用过作者的模型 2.应该是版本的问题？

xianzhe-741 commented 3 years ago

font{
    line-height: 1.6;
}
ul,ol{
    padding-left: 20px;
    list-style-position: inside;
}

好的谢谢您

                            389261056

                                389261056@qq.com

    签名由
    网易邮箱大师
    定制

好的，谢谢您。下面这个问题看很多人都在问，我也遇见了，请问您是否有遇见。如果可以的话能加一下您的微信像您请教一下么（xianzhe741） Traceback (most recent call last):

File "test.py", line 121, in test(args) File "test.py", line 91, in test model = TextRecognition(args.pb_path, cfg.seq_len+1) File "test.py", line 23, in init self.init_model() File "test.py", line 37, in init_model self.label_ph = self.sess.graph.get_tensor_by_name('label:0') File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3972, in get_tensor_by_name return self.as_graph_element(name, allow_tensor=True, allow_operation=False) File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3796, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3838, in _as_graph_element_locked "graph." % (repr(name), repr(op_name))) KeyError: "The name 'label:0' refers to a Tensor which does not exist. The operation, 'label', does not exist in the graph.” 在2020年12月15日 17:40，wang pengyuannotifications@github.com 写道：

你好，我使用过程中有两个问题请教一下：

test.py过程中使用作者docker中的模型text_recognition5435.pb，在 = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33] 2.在train.py时报错 File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init assert_type(model, ModelDescBase, 'model') File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type name, tp.name, v.class.name) AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

1.我没用过作者的docker，我是直接按照这个需求配置的本地虚拟环境，也没用过作者的模型 2.应该是版本的问题？

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

zhang0jhon / AttentionOCR

steps_per_epoch根据训练集的不同需要修改吗? #90