Hyperparameter tuning suggestions

jwzhi commented 2 years ago

Hi,

I tried to find the best hyperparameter using ax search. However, after searching > 150 trials on distmult, I am not getting very search results (I mean MRR_filtered_with_test > 0.3 on FB15k-237).

I also noticed that in your kge-iclr20 repo, https://github.com/uma-pi1/kge-iclr20, for each model-negativesampling-loss, you only tried 30 trials while search space is bigger than mine (for lr, I only search from 0.5 - 0.0005). I wonder if there is anything that I did wrong and made my HP search unsuccessful.

Here is one of my search script

job.type: search
search.type: ax

dataset:
  name: codex-m
  num_entities: 17050
  num_relations: 51
distmult:
  entity_embedder:
    type: gcn_embedder
eval:
  batch_size: 256
  metrics_per:
    relation_type: true
  trace_level: example
import:
- distmult
- reciprocal_relations_model
lookup_embedder:
  dim: 128
  initialize: xavier_normal_
  initialize_args:
    normal_:
      mean: 0.0
      std: 0.04037805388365049
    uniform_:
      a: -0.9352212163936202
    xavier_normal_:
      gain: 1.0
    xavier_uniform_:
      gain: 1.0
  regularize: ''
model: reciprocal_relations_model
reciprocal_relations_model:
  base_model:
    type: distmult
    entity_embedder:
      dim: 200
      dropout: 0.3
    relation_embedder:
      dim: -1

train:
  auto_correct: true
  batch_size: 1024
  lr_scheduler: ReduceLROnPlateau
  max_epochs: 500
  optimizer_args:
    lr: 0.001
  type: KvsAll
  optimizer:
    default:
      type: Adam 
  loss: bce

valid:
  early_stopping:
    min_threshold:
      epochs: 50
      metric_value: 0.05
    patience: 10
gcn_embedder:
  regularize: ''            # '', 'lp'
  regularize_weight: 0.0
  in_dim: -1                    # the input dimension defined from lookup_embedders
  dim: 200        # the dimension of the gcn layers
  dropout: 0.3
  activation: relu             # relu or gelu
  num_layers: 1               # the number of sub-encoder-layers in the encoder  

KvsAll:
  label_smoothing: 0.1

entity_ranking:
  tie_handling:
    atol: 1e1
    rtol: 1e1

ax_search:
  num_trials: 500
  num_sobol_trials: 300 # remaining trials are Bayesian
  parameters:
    - name: train.optimizer
      type: choice
      values: [Adam, Adagrad]
    - name: train.loss
      type: choice
      values: [bce, kl]   
    - name: KvsAll.label_smoothing
      type: choice
      values: [0.0, 0.1, 0.2]  

    - name: train.optimizer_args.lr
      type: range
      bounds: [0.0005, 0.5]

    # embedding dimension
    - name: gcn_embedder.num_layers
      type: choice 
      values: [1,2,3]

    # embedding initialization
    - name: lookup_embedder.initialize
      type: choice
      values: [xavier_normal_, xavier_uniform_, normal_, uniform_]  
    - name: lookup_embedder.initialize_args.normal_.mean
      type: fixed
      value: 0.0
    - name: lookup_embedder.initialize_args.normal_.std
      type: range
      bounds: [0.00001, 1.0]
      log_scale: True
    - name: lookup_embedder.initialize_args.uniform_.a
      type: range
      bounds: [-1.0, -0.00001]
    - name: lookup_embedder.initialize_args.xavier_uniform_.gain
      type: fixed
      value: 1.0
    - name: lookup_embedder.initialize_args.xavier_normal_.gain
      type: fixed
      value: 1.0

    # embedding regularization
    - name: lookup_embedder.regularize
      type: choice
      values: ['', 'lp']
    - name: lookup_embedder.regularize_weight
      type: range
      bounds: [1.0e-20, 1.0e-01]
      log_scale: True

    # embedding initialization
    - name: gcn_embedder.initialize
      type: choice
      values: [xavier_normal_, xavier_uniform_, normal_, uniform_]  
    - name: gcn_embedder.initialize_args.normal_.mean
      type: fixed
      value: 0.0
    - name: gcn_embedder.initialize_args.normal_.std
      type: range
      bounds: [0.00001, 1.0]
      log_scale: True
    - name: gcn_embedder.initialize_args.uniform_.a
      type: range
      bounds: [-1.0, -0.00001]
    - name: gcn_embedder.initialize_args.xavier_uniform_.gain
      type: fixed
      value: 1.0
    - name: gcn_embedder.initialize_args.xavier_normal_.gain
      type: fixed
      value: 1.0
    # embedding regularization
    - name: gcn_embedder.regularize
      type: choice
      values: ['', 'lp']
    - name: gcn_embedder.regularize_weight
      type: range
      bounds: [1.0e-20, 1.0e-01]
      log_scale: True

    # embedding dropout
    - name: gcn_embedder.dropout
      type: range
      bounds: [-0.5, 0.5]
    - name: reciprocal_relations_model.base_model.entity_embedder.dim
      type: choice
      values: [128, 200, 256]

Your help is very much appreciated!

rufex2001 commented 2 years ago

This seems like it is not DistMult, but a GCN+Distmult, so it may be that it's more difficult to train than DistMult. It may also be that it does not really perform as well as DistMult, at least when trained as DistMult was. In general, for any new model, new settings that aren't necessarily the best for other models may be the ones that work, and in that sense, it might be best to keep the search space as less restricted as possible, so the random search can have a better chance of finding a successful solution. That is, I'd suggest not restricting things like embedding size, batch size or regularization type instead of increasing the number of trials. You can always increase the number of trials later to get a more fine grained search.

On Sat, 30 Oct 2021 at 20:39, Jing Zhu @.***> wrote:

Hi,

I tried to find the best hyperparameter using ax search. However, after searching > 150 trials on distmult, I am not getting very search results (I mean MRR_filtered_with_test > 0.3 on FB15k-237).

I also noticed that in your kge-iclr20 repo, https://github.com/uma-pi1/kge-iclr20, for each model-negativesampling-loss, you only tried 30 trials while search space is bigger than mine (for lr, I only search from 0.5 - 0.0005). I wonder if there is anything that I did wrong and made my HP search unsuccessful.

Here is one of my search script

job.type: search search.type: ax

dataset: name: codex-m num_entities: 17050 num_relations: 51 distmult: entity_embedder: type: gcn_embedder eval: batch_size: 256 metrics_per: relation_type: true trace_level: example import:

distmult

reciprocal_relations_model lookup_embedder: dim: 128 initialize: xaviernormal initializeargs: normal: mean: 0.0 std: 0.04037805388365049 uniform_: a: -0.9352212163936202 xaviernormal: gain: 1.0 xavieruniform: gain: 1.0 regularize: '' model: reciprocal_relations_model reciprocal_relations_model: base_model: type: distmult entity_embedder: dim: 200 dropout: 0.3 relation_embedder: dim: -1

train: auto_correct: true batch_size: 1024 lr_scheduler: ReduceLROnPlateau max_epochs: 500 optimizer_args: lr: 0.001 type: KvsAll optimizer: default: type: Adam loss: bce

valid: early_stopping: min_threshold: epochs: 50 metric_value: 0.05 patience: 10 gcn_embedder: regularize: '' # '', 'lp' regularize_weight: 0.0 in_dim: -1 # the input dimension defined from lookup_embedders dim: 200 # the dimension of the gcn layers dropout: 0.3 activation: relu # relu or gelu num_layers: 1 # the number of sub-encoder-layers in the encoder

KvsAll: label_smoothing: 0.1

entity_ranking: tie_handling: atol: 1e1 rtol: 1e1

ax_search: num_trials: 500 num_sobol_trials: 300 # remaining trials are Bayesian parameters:

name: train.optimizer type: choice values: [Adam, Adagrad]

name: train.loss type: choice values: [bce, kl]

name: KvsAll.label_smoothing type: choice values: [0.0, 0.1, 0.2]

name: train.optimizer_args.lr type: range bounds: [0.0005, 0.5]

embedding dimension

name: gcn_embedder.num_layers type: choice values: [1,2,3]

embedding initialization

name: lookup_embedder.initialize type: choice values: [xaviernormal, xavieruniform, normal, uniform]

name: lookup_embedder.initializeargs.normal.mean type: fixed value: 0.0

name: lookup_embedder.initializeargs.normal.std type: range bounds: [0.00001, 1.0] log_scale: True

name: lookup_embedder.initializeargs.uniform.a type: range bounds: [-1.0, -0.00001]

name: lookup_embedder.initialize_args.xavieruniform.gain type: fixed value: 1.0

name: lookup_embedder.initialize_args.xaviernormal.gain type: fixed value: 1.0

embedding regularization

name: lookup_embedder.regularize type: choice values: ['', 'lp']

name: lookup_embedder.regularize_weight type: range bounds: [1.0e-20, 1.0e-01] log_scale: True

embedding initialization

name: gcn_embedder.initialize type: choice values: [xaviernormal, xavieruniform, normal, uniform]

name: gcn_embedder.initializeargs.normal.mean type: fixed value: 0.0

name: gcn_embedder.initializeargs.normal.std type: range bounds: [0.00001, 1.0] log_scale: True

name: gcn_embedder.initializeargs.uniform.a type: range bounds: [-1.0, -0.00001]

name: gcn_embedder.initialize_args.xavieruniform.gain type: fixed value: 1.0

name: gcn_embedder.initialize_args.xaviernormal.gain type: fixed value: 1.0

embedding regularization

name: gcn_embedder.regularize type: choice values: ['', 'lp']

name: gcn_embedder.regularize_weight type: range bounds: [1.0e-20, 1.0e-01] log_scale: True

embedding dropout

name: gcn_embedder.dropout type: range bounds: [-0.5, 0.5]

name: reciprocal_relations_model.base_model.entity_embedder.dim type: choice values: [128, 200, 256]

Your help is very much appreciated!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uma-pi1/kge/issues/241, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEWXZBQBC3T6IL7WBUPYGDUJQ3ULANCNFSM5HBLKQXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

BugsBuggy commented 2 years ago

Hi together, in addition I found some smaller bugs: (i) The lower boundary for the learning rate actually used was 0.0003, not 0.0001 as stated in the iclr20 paper. (ii)The lower boundary for the dropout should be 0.0, not -0.5.

@rufex2001 can you confirm this?

BugsBuggy commented 2 years ago

@jwzhi I noticed you're not using log_scale: True on the train.optimizer_args.lr. This might also help.

samuelbroscheit commented 2 years ago

+1 good catch @BugsBuggy on the missing log_scale: True for the learning rate! This will lead to many very similar learning rates.

+1 on all what Daniel said.

The lower boundary for the dropout should be 0.0, not -0.5.

No that is correct, this means that whenever dropout is < 0 it will be 0. This means that for half of all SOBOL dropout will be 0.

A few more comments/thoughts:

I guess you have the same config for fb15k-237, because the one you posted is for Codex?
I would also include the embedder activation function into the search space (you seem to have the compute budget).

BugsBuggy commented 2 years ago

Indeed, now I see that it makes sense to set dropout to -0.5

@jwzhi I wonder how the tie_handling tolerances affect your results. Since I am dealing with tie-handling errors I experimentally used the same tolerances as you and the evaluation results were all pretty bad. So this might be an error source, too.

jwzhi commented 2 years ago

This seems like it is not DistMult, but a GCN+Distmult, so it may be that it's more difficult to train than DistMult. It may also be that it does not really perform as well as DistMult, at least when trained as DistMult was. In general, for any new model, new settings that aren't necessarily the best for other models may be the ones that work, and in that sense, it might be best to keep the search space as less restricted as possible, so the random search can have a better chance of finding a successful solution. That is, I'd suggest not restricting things like embedding size, batch size or regularization type instead of increasing the number of trials. You can always increase the number of trials later to get a more fine grained search. … On Sat, 30 Oct 2021 at 20:39, Jing Zhu @.***> wrote: Hi, I tried to find the best hyperparameter using ax search. However, after searching > 150 trials on distmult, I am not getting very search results (I mean MRR_filtered_with_test > 0.3 on FB15k-237). I also noticed that in your kge-iclr20 repo, https://github.com/uma-pi1/kge-iclr20, for each model-negativesampling-loss, you only tried 30 trials while search space is bigger than mine (for lr, I only search from 0.5 - 0.0005). I wonder if there is anything that I did wrong and made my HP search unsuccessful. Here is one of my search script job.type: search search.type: ax dataset: name: codex-m num_entities: 17050 num_relations: 51 distmult: entity_embedder: type: gcn_embedder eval: batch_size: 256 metrics_per: relation_type: true trace_level: example import: - distmult - reciprocal_relations_model lookup_embedder: dim: 128 initialize: xaviernormal initializeargs: normal: mean: 0.0 std: 0.04037805388365049 uniform_: a: -0.9352212163936202 xaviernormal: gain: 1.0 xavieruniform: gain: 1.0 regularize: '' model: reciprocal_relations_model reciprocal_relations_model: base_model: type: distmult entity_embedder: dim: 200 dropout: 0.3 relation_embedder: dim: -1 train: auto_correct: true batch_size: 1024 lr_scheduler: ReduceLROnPlateau max_epochs: 500 optimizer_args: lr: 0.001 type: KvsAll optimizer: default: type: Adam loss: bce valid: early_stopping: min_threshold: epochs: 50 metric_value: 0.05 patience: 10 gcn_embedder: regularize: '' # '', 'lp' regularize_weight: 0.0 in_dim: -1 # the input dimension defined from lookup_embedders dim: 200 # the dimension of the gcn layers dropout: 0.3 activation: relu # relu or gelu num_layers: 1 # the number of sub-encoder-layers in the encoder KvsAll: label_smoothing: 0.1 entity_ranking: tie_handling: atol: 1e1 rtol: 1e1 ax_search: num_trials: 500 num_sobol_trials: 300 # remaining trials are Bayesian parameters: - name: train.optimizer type: choice values: [Adam, Adagrad] - name: train.loss type: choice values: [bce, kl] - name: KvsAll.label_smoothing type: choice values: [0.0, 0.1, 0.2] - name: train.optimizer_args.lr type: range bounds: [0.0005, 0.5] # embedding dimension - name: gcn_embedder.num_layers type: choice values: [1,2,3] # embedding initialization - name: lookup_embedder.initialize type: choice values: [xaviernormal, xavieruniform, normal, uniform] - name: lookup_embedder.initializeargs.normal.mean type: fixed value: 0.0 - name: lookup_embedder.initializeargs.normal.std type: range bounds: [0.00001, 1.0] log_scale: True - name: lookup_embedder.initializeargs.uniform.a type: range bounds: [-1.0, -0.00001] - name: lookup_embedder.initialize_args.xavieruniform.gain type: fixed value: 1.0 - name: lookup_embedder.initialize_args.xaviernormal.gain type: fixed value: 1.0 # embedding regularization - name: lookup_embedder.regularize type: choice values: ['', 'lp'] - name: lookup_embedder.regularize_weight type: range bounds: [1.0e-20, 1.0e-01] log_scale: True # embedding initialization - name: gcn_embedder.initialize type: choice values: [xaviernormal, xavieruniform, normal, uniform] - name: gcn_embedder.initializeargs.normal.mean type: fixed value: 0.0 - name: gcn_embedder.initializeargs.normal.std type: range bounds: [0.00001, 1.0] log_scale: True - name: gcn_embedder.initializeargs.uniform.a type: range bounds: [-1.0, -0.00001] - name: gcn_embedder.initialize_args.xavieruniform.gain type: fixed value: 1.0 - name: gcn_embedder.initialize_args.xaviernormal.gain type: fixed value: 1.0 # embedding regularization - name: gcn_embedder.regularize type: choice values: ['', 'lp'] - name: gcn_embedder.regularize_weight type: range bounds: [1.0e-20, 1.0e-01] log_scale: True # embedding dropout - name: gcn_embedder.dropout type: range bounds: [-0.5, 0.5] - name: reciprocal_relations_model.base_model.entity_embedder.dim type: choice values: [128, 200, 256] Your help is very much appreciated! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#241>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEWXZBQBC3T6IL7WBUPYGDUJQ3ULANCNFSM5HBLKQXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

One Issue that I would find if I increase the search space is, there are so many combinations of each HP, and it's pretty likely that it's the combination that does not make sense. How do you deal with this without increasing the number of trials?

jwzhi commented 2 years ago

@jwzhi I noticed you're not using log_scale: True on the train.optimizer_args.lr. This might also help.

Yes. That helps! Thank you

jwzhi commented 2 years ago

+1 good catch @BugsBuggy on the missing log_scale: True for the learning rate! This will lead to many very similar learning rates.

+1 on all what Daniel said.

The lower boundary for the dropout should be 0.0, not -0.5.

No that is correct, this means that whenever dropout is < 0 it will be 0. This means that for half of all SOBOL dropout will be 0.

A few more comments/thoughts:

I guess you have the same config for fb15k-237, because the one you posted is for Codex?

I would also include the embedder activation function into the search space (you seem to have the compute budget).

Oh right. I have the same search space for FB15k and Codex. Sorry I posted the wrong scripts. Good points about the log_scale of lr. I am actually wondering why all of the learning rates are pretty large and it does not work well. :)

jwzhi commented 2 years ago

Indeed, now I see that it makes sense to set dropout to -0.5

@jwzhi I wonder how the tie_handling tolerances affect your results. Since I am dealing with tie-handling errors I experimentally used the same tolerances as you and the evaluation results were all pretty bad. So this might be an error source, too.

Not sure about the tie handling. Actually I turned the tolerance to be the default value for the current HP search. If you are using conve, take a look at if you have all of the config options set correctly. I noticed that one time it's wrong because of incorrect dimensions in conve.

uma-pi1 / kge

Hyperparameter tuning suggestions #241

embedding dimension

embedding initialization

embedding regularization

embedding initialization

embedding regularization

embedding dropout