wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
https://arxiv.org/abs/2004.07464
MIT License
556 stars 193 forks source link

需要准备训练的样本量要多大? #3

Closed yuzi1949 closed 4 years ago

yuzi1949 commented 4 years ago

请问训练过程中需要多少样本量,每个类别大概多少个图片?谢谢

wenwenyu commented 4 years ago

@yuzi1949 数据量取决于文档的类型和难度,像火车票这样固定布局的数据集,我们使用了大概1.5k的生成数据加0.3k的真实数据,而发票这种布局变化大数字多的文档,我们收集了大概2k的真实数据训练。所以需要根据你的文档来调整样本大小,没有一个确定的数字,类别一般是每张图片都有。当然,在条件允许的情况下,数据量越多越好。

wenwenyu commented 4 years ago

I am going to close this issue due to no response. Please feel free to reopen or create a new one if you have more questions.