Open senthil-r-10 opened 1 year ago
The model is still learning. You might use the fine-tuning performance as a more robust indicator.
Is increasing the input size from 224x224 to 384x384 or 672x672 with a patch size of 16x16 will help the model to converge? In the paper, they mentioned BEiT(L) 384x384 model performs better than BEiT(L)224x224.
Hi, I have followed the steps in the notebook to train the BEiT model, but the loss is not reducing after few epochs, initial loss is 7.2 and the loss stagnated at 4.3 after few epochs.
I have used 1M documents from the IIT-CDIP dataset, documents are resized to 224x224 using BeitImageProcessor, model: config = BeitConfig(use_relative_position_bias=True, use_mask_token=True) model = BeitForMaskedImageModeling(config)
loss: CrossEntropyLoss optimizer: AdamW(model.parameters(), lr=1e-5, weight_decay=0.05) lrscheduler: ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=0, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=1e-6, eps=1e-08, verbose=True) training code:
` def train(model, encoder, train_dataloader, device, rank, optimizer, loss_fn, scheduler, epochs, save_model_dir):
`