Closed LebrontoJ closed 4 months ago
Hi, could you please provide the configurations you used?
dataset: name: FB15k-237 # wikidata5m_v3
v1: False
is_legacy: False model: name: t5-small tokenizer_type: t5 max_input_length: 512 max_output_length: 40 context: use: True max_size: 100 shuffle: True descriptions: use: False train: batch_size: 4 max_epochs: 100 drop_subject: 0.0 num_workers: 4 precision: 16 accelerator: auto devices: auto strategy: ddp_find_unused_parameters_false
eval: num_predictions: 100 max_length: 40 batch_size: 1 valid: every: 1 tiny: False # True checkpoint: keep_top_k: 3
resume_from: "" wandb: use: False project_name: kgt5context${dataset.name} run_name: v1=${dataset.v1}_desc=${descriptions.use}_bs=${train.batch_size}
hydra: job: chdir: False run: dir: ./outputs/${dataset.name}/v1=${dataset.v1}/descriptions=${descriptions.use}/${now:%Y-%m-%d-%H-%M}
Hi, I think the main problem is the small batch size. Yesterday I tried the same setting on a single GPU but with batch size 96 and it worked out fine. With even larger total batch sizes it should be more stable. But to be sure, I uploaded the dataset here, so we work on the same one.
If neither the batch size nor the dataset help, I suggest to use some small weight decay additionally. You can do so by changing this line: https://github.com/uma-pi1/kgt5-context/blob/f9b9272e19a6855746871385b1113fdeb14c18aa/kgt5_model.py#L44
to
optimizer = Adafactor(self.parameters(), scale_parameter=True, relative_step=True, warmup_init=True, lr=None, weight_decay=0.00001)
Thank you for your help! I've added the batch size and hope it will work well
Closing the issue for now. Feel free to reopen if the issue persists.
When I tried to run your model on the FB15k-237 dataset, there were errors about the probability tensor every time after serveral epochs(the exact number of epochs varied all the time). Do you have any idea about this?