uma-pi1 / kge

LibKGE - A knowledge graph embedding library for reproducible research
MIT License
765 stars 124 forks source link

TransE returns .nan for loss sometimes #240

Closed jwzhi closed 2 years ago

jwzhi commented 2 years ago

Hi,

Thanks for the great repo. I am trying to do a hyperparameter search on TransE, and one thing that I found is, for some hyperparameter, TransE will return nan for batch_result.avg_loss. I wonder

  1. Have you seen this before and do you have an idea how to solve this?
  2. Is there a way to just stop my current training trial, say trial 00012 and continue to trial 00013 if such error occurs? If so, what is the config that I should set? I am using random search in ax.

Thanks for the help!

rgemulla commented 2 years ago
  1. You'd need to see which HPs lead to this behaviour (e.g., too high learning rate).
  2. Set search.on_error to continue. Works for all search types.
jwzhi commented 2 years ago

Yes. I think it's because the learning rate is too high, for my search space, the learning rate usually returns 0.1xx, 0.2xx, 0.3xx. Thanks for your suggestions. This is helpful!