Model not training on large dataset

lalitr994 commented 2 years ago

Hi, I have trained Layoutlm v2 funsd model with around 5K images with 30K steps. after successful training inference is not working correctly(not even a single correct prediction). but the same setup is working if I trained around 2.5 images. I am not sure what I am missing.

lalitr994 commented 2 years ago

One observation I found is learning rate is getting 0 at end of training.

sherlocked27 commented 2 years ago

Hi @lalitr994, can you elaborate on what is the batch size you used and what scores are you getting for the test set while training?

sathwikacharya commented 2 years ago

Hey I am having a similar issue where the logits outputted by the model after training is nan. Any reason why this is happening? I am training this on a custom dataset with 29 classes and 40000 data points. The steps followed are identical to this link except for a few tweaks : https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/RVL-CDIP/Fine_tuning_LayoutLMv2ForSequenceClassification_on_RVL_CDIP.ipynb

I do not know if this is a problem but I am training it on multiple gpus using the Accelerate API of huggingface. Any help to this is much appreciated

nissansz commented 2 years ago

hi, where to get or pretrain the models for Japanese, Korean, etc.? steve8000818@gmail.com

microsoft / unilm

Model not training on large dataset #486