Using fast-bert to fine-tune pretrained BioBERT model

Hi.

I'd like to use fast-bert to fine-tune a BioBERT model on a NER corpus.

Here is the code I use to create a learner from a pretrained BioBERT model:

learner = BertLearner.from_pretrained_model(
    databunch,
    pretrained_path="dmis-lab/biobert-base-cased-v1.1",
    metrics=metrics,
    device=device_cuda,
    logger=logger,
    output_dir=OUTPUT_DIR,
    finetuned_wgts_path=None,
    warmup_steps=500,
    multi_gpu=True,
    is_fp16=True,
    multi_label=False,
    logging_steps=50,
)

After 10h training on 2 GPUs, the only logs I have are a bunch of WARNING:root:NaN or Inf found in input tensor.. From the tensorboard tfevents file, I can see that the valid loss is NaN...

Before trying to find out what's wrong, could you please confirm that it's actually conceptually feasible to fine-tune a BioBERT model using fast-bert ?

utterworks / fast-bert

Using fast-bert to fine-tune pretrained BioBERT model #271