Finetuning for caption generation -- empty captions

Hey! I'm trying to finetune Oscar on my own dataset. I'm using features extracted from https://github.com/airsplay/py-bottom-up-attention but for some reason when I try to overfit a small training set I'm seeing completely empty captions in my predictions. Is that an issue anyone has seen before? I've tried several different learning rates etc but nothing seems to fix the issue. Appreciate any and all help! (And congrats on the wonderful work!).

python oscar/run_captioning.py \
    --data_dir data \
    --model_name_or_path models/checkpoint-29-66420 \
    --do_train \
    --do_lower_case \
    --evaluate_during_training \
    --add_od_labels \
    --learning_rate 0.00003 \
    --per_gpu_train_batch_size 16 \
    --num_train_epochs 100 \
    --save_steps 100 \
    --output_dir output/

microsoft / Oscar

Finetuning for caption generation -- empty captions #100