tensorflow / models

Models and examples built with TensorFlow
Other
77.17k stars 45.76k forks source link

unable to reach 84.2 test seq accuracy on fsns dataset with pretrained inception_resnet_v2 encoder, batch size = 16 #5729

Closed rohitsaluja22 closed 4 years ago

rohitsaluja22 commented 6 years ago

What is the top-level directory of the model you are using: attention_ocr/python Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04 TensorFlow installed from (source or binary): pip install --upgrade tensorflow-gpu TensorFlow version (use command below): 1.4.1. Bazel version: N/A CUDA/cuDNN version: cuda/8.0 cudnn/7.1.2 GPU model and memory: 4 x GeForce GTX 1080 Exact command to reproduce: python ../eval.py --dataset_dir=/home/ayush/OCR/fsns_ayush/data/fsns/ --train_log_dir=/dev/saved_models_h4/ --split=test --batch_size=204

Hi I modified the model.py to use inception_resnet_v2 and enabled co-ordinate encoding. I used pretrained inception resnet_v2, but using batch size of 16:- CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --batch_size=16 --checkpoint_inception=./inception_resnet_v2_2016_08_30.ckpt

Still I am not able to reach accuracy of 84.2 on fsns dataset as shown in paper, even after training for 853k iterations.

So I have 3 questions:-

  1. Will batch size = 32 give me 84.2 accuracy?
  2. or/and did you used random initialization of incoder instead of pretrained inception_resnet_v2?
  3. Shall I change eval.py at test time, is default code randomly cropping 80% of image and resizing during test time as well?

Basically I want to know how can I exactly replicate the training process to get 84.2 % sequence level accuracy on fsns dataset?

JingLiJJ commented 5 years ago

I have the same problem. Can you tell me your accuracy, please? Did you just change the batch size and keep other parameters same?

rohitsaluja22 commented 5 years ago

I achieved 83.27% full seq acc on test set after 853k steps (with batch size 16) of training. As per readme, it should reach 83.79% full seq acc after 400k steps with resnet_v3, encoding disabled (I guess) and batch size 32 (I guess).

@JingLiJJ may you share yours accuracy with time step and batch size.

@nealwu Any guess where am I getting wrong? Also, is there some random crop applied in eval.py too?

JingLiJJ commented 5 years ago

I obtained around 82.7~83.6% with inception-v3 after 400k steps with batch_size. They use random test dataset to test the model. So, I guess maybe they choose the best one?

I change inception-v3 to inception-resnet-v2 with batch_size 16. But, accuracy is decreased to around 82%. The accuracy you mentioned is obtained by inception-resnet-v2? Where should I change too? Maybe I miss sth.

By the way, eval.py does not use data augmentation I suppose.

rohitsaluja22 commented 5 years ago

Hey @JingLiJJ, you forgot to mention batch_size. Also did you enabled OHE? Yes I used inception-resnet-v2 and batch size 16. You need to change model.py(inception-resnet-v2 and, Mixed 6a) and common_flags.py (Mixed 6a). Also I added this from online:- /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/nets/inception_resnet_v2.py and modified:- /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/nets/inception.py

I am not able to understand from where does the randomness come in eval.py. In script eval.py, they use:-

data = data_provider.get_data( dataset, FLAGS.batch_size, augment=False, central_crop_size=common_flags.get_crop_size())

common_flags.get_crop_size() returns (None,None), so you are right. still want to understand how random test data is choosen, is it possible for you to help?

jmgrn61 commented 5 years ago

Hello @rohitsaluja22 , could you tell me where this 204 came from? And how did you set num_batches in this case?

Currently, I am always getting a sequence accuracy of ~0.93, which is much higher than ~0.84. So I guess there must be something wrong there with my data setup. I used the default num_batches, and tried default batch_size and 204. Of course, the split_name is set to "test". The checkpoint file is the pertained one downloaded from Readme.

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

R1234A commented 4 years ago

@rohitsaluja22 @jmgrn61 How you guys are testing the trained model on the custom dataset as demo_inference.py is not giving correct output.

How to test my model to see the results??