Closed jwzhi closed 2 years ago
This seems like it is not DistMult, but a GCN+Distmult, so it may be that it's more difficult to train than DistMult. It may also be that it does not really perform as well as DistMult, at least when trained as DistMult was. In general, for any new model, new settings that aren't necessarily the best for other models may be the ones that work, and in that sense, it might be best to keep the search space as less restricted as possible, so the random search can have a better chance of finding a successful solution. That is, I'd suggest not restricting things like embedding size, batch size or regularization type instead of increasing the number of trials. You can always increase the number of trials later to get a more fine grained search.
On Sat, 30 Oct 2021 at 20:39, Jing Zhu @.***> wrote:
Hi,
I tried to find the best hyperparameter using ax search. However, after searching > 150 trials on distmult, I am not getting very search results (I mean MRR_filtered_with_test > 0.3 on FB15k-237).
I also noticed that in your kge-iclr20 repo, https://github.com/uma-pi1/kge-iclr20, for each model-negativesampling-loss, you only tried 30 trials while search space is bigger than mine (for lr, I only search from 0.5 - 0.0005). I wonder if there is anything that I did wrong and made my HP search unsuccessful.
Here is one of my search script
job.type: search search.type: ax
dataset: name: codex-m num_entities: 17050 num_relations: 51 distmult: entity_embedder: type: gcn_embedder eval: batch_size: 256 metrics_per: relation_type: true trace_level: example import:
- distmult
- reciprocal_relations_model lookup_embedder: dim: 128 initialize: xaviernormal initializeargs: normal: mean: 0.0 std: 0.04037805388365049 uniform_: a: -0.9352212163936202 xaviernormal: gain: 1.0 xavieruniform: gain: 1.0 regularize: '' model: reciprocal_relations_model reciprocal_relations_model: base_model: type: distmult entity_embedder: dim: 200 dropout: 0.3 relation_embedder: dim: -1
train: auto_correct: true batch_size: 1024 lr_scheduler: ReduceLROnPlateau max_epochs: 500 optimizer_args: lr: 0.001 type: KvsAll optimizer: default: type: Adam loss: bce
valid: early_stopping: min_threshold: epochs: 50 metric_value: 0.05 patience: 10 gcn_embedder: regularize: '' # '', 'lp' regularize_weight: 0.0 in_dim: -1 # the input dimension defined from lookup_embedders dim: 200 # the dimension of the gcn layers dropout: 0.3 activation: relu # relu or gelu num_layers: 1 # the number of sub-encoder-layers in the encoder
KvsAll: label_smoothing: 0.1
entity_ranking: tie_handling: atol: 1e1 rtol: 1e1
ax_search: num_trials: 500 num_sobol_trials: 300 # remaining trials are Bayesian parameters:
name: train.optimizer type: choice values: [Adam, Adagrad]
name: train.loss type: choice values: [bce, kl]
name: KvsAll.label_smoothing type: choice values: [0.0, 0.1, 0.2]
name: train.optimizer_args.lr type: range bounds: [0.0005, 0.5]
embedding dimension
name: gcn_embedder.num_layers type: choice values: [1,2,3]
embedding initialization
name: lookup_embedder.initialize type: choice values: [xaviernormal, xavieruniform, normal, uniform]
name: lookup_embedder.initializeargs.normal.mean type: fixed value: 0.0
name: lookup_embedder.initializeargs.normal.std type: range bounds: [0.00001, 1.0] log_scale: True
name: lookup_embedder.initializeargs.uniform.a type: range bounds: [-1.0, -0.00001]
name: lookup_embedder.initialize_args.xavieruniform.gain type: fixed value: 1.0
name: lookup_embedder.initialize_args.xaviernormal.gain type: fixed value: 1.0
embedding regularization
name: lookup_embedder.regularize type: choice values: ['', 'lp']
name: lookup_embedder.regularize_weight type: range bounds: [1.0e-20, 1.0e-01] log_scale: True
embedding initialization
name: gcn_embedder.initialize type: choice values: [xaviernormal, xavieruniform, normal, uniform]
name: gcn_embedder.initializeargs.normal.mean type: fixed value: 0.0
name: gcn_embedder.initializeargs.normal.std type: range bounds: [0.00001, 1.0] log_scale: True
name: gcn_embedder.initializeargs.uniform.a type: range bounds: [-1.0, -0.00001]
name: gcn_embedder.initialize_args.xavieruniform.gain type: fixed value: 1.0
name: gcn_embedder.initialize_args.xaviernormal.gain type: fixed value: 1.0
embedding regularization
name: gcn_embedder.regularize type: choice values: ['', 'lp']
name: gcn_embedder.regularize_weight type: range bounds: [1.0e-20, 1.0e-01] log_scale: True
embedding dropout
name: gcn_embedder.dropout type: range bounds: [-0.5, 0.5]
name: reciprocal_relations_model.base_model.entity_embedder.dim type: choice values: [128, 200, 256]
Your help is very much appreciated!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uma-pi1/kge/issues/241, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEWXZBQBC3T6IL7WBUPYGDUJQ3ULANCNFSM5HBLKQXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi together, in addition I found some smaller bugs: (i) The lower boundary for the learning rate actually used was 0.0003, not 0.0001 as stated in the iclr20 paper. (ii)The lower boundary for the dropout should be 0.0, not -0.5.
@rufex2001 can you confirm this?
@jwzhi I noticed you're not using log_scale: True
on the train.optimizer_args.lr
. This might also help.
+1 good catch @BugsBuggy on the missing log_scale: True
for the learning rate! This will lead to many very similar learning rates.
+1 on all what Daniel said.
The lower boundary for the dropout should be 0.0, not -0.5.
No that is correct, this means that whenever dropout is < 0 it will be 0. This means that for half of all SOBOL dropout will be 0.
A few more comments/thoughts:
Indeed, now I see that it makes sense to set dropout to -0.5
@jwzhi I wonder how the tie_handling tolerances affect your results. Since I am dealing with tie-handling errors I experimentally used the same tolerances as you and the evaluation results were all pretty bad. So this might be an error source, too.
This seems like it is not DistMult, but a GCN+Distmult, so it may be that it's more difficult to train than DistMult. It may also be that it does not really perform as well as DistMult, at least when trained as DistMult was. In general, for any new model, new settings that aren't necessarily the best for other models may be the ones that work, and in that sense, it might be best to keep the search space as less restricted as possible, so the random search can have a better chance of finding a successful solution. That is, I'd suggest not restricting things like embedding size, batch size or regularization type instead of increasing the number of trials. You can always increase the number of trials later to get a more fine grained search. … On Sat, 30 Oct 2021 at 20:39, Jing Zhu @.***> wrote: Hi, I tried to find the best hyperparameter using ax search. However, after searching > 150 trials on distmult, I am not getting very search results (I mean MRR_filtered_with_test > 0.3 on FB15k-237). I also noticed that in your kge-iclr20 repo, https://github.com/uma-pi1/kge-iclr20, for each model-negativesampling-loss, you only tried 30 trials while search space is bigger than mine (for lr, I only search from 0.5 - 0.0005). I wonder if there is anything that I did wrong and made my HP search unsuccessful. Here is one of my search script job.type: search search.type: ax dataset: name: codex-m num_entities: 17050 num_relations: 51 distmult: entity_embedder: type: gcn_embedder eval: batch_size: 256 metrics_per: relation_type: true trace_level: example import: - distmult - reciprocal_relations_model lookup_embedder: dim: 128 initialize: xaviernormal initializeargs: normal: mean: 0.0 std: 0.04037805388365049 uniform_: a: -0.9352212163936202 xaviernormal: gain: 1.0 xavieruniform: gain: 1.0 regularize: '' model: reciprocal_relations_model reciprocal_relations_model: base_model: type: distmult entity_embedder: dim: 200 dropout: 0.3 relation_embedder: dim: -1 train: auto_correct: true batch_size: 1024 lr_scheduler: ReduceLROnPlateau max_epochs: 500 optimizer_args: lr: 0.001 type: KvsAll optimizer: default: type: Adam loss: bce valid: early_stopping: min_threshold: epochs: 50 metric_value: 0.05 patience: 10 gcn_embedder: regularize: '' # '', 'lp' regularize_weight: 0.0 in_dim: -1 # the input dimension defined from lookup_embedders dim: 200 # the dimension of the gcn layers dropout: 0.3 activation: relu # relu or gelu num_layers: 1 # the number of sub-encoder-layers in the encoder KvsAll: label_smoothing: 0.1 entity_ranking: tie_handling: atol: 1e1 rtol: 1e1 ax_search: num_trials: 500 num_sobol_trials: 300 # remaining trials are Bayesian parameters: - name: train.optimizer type: choice values: [Adam, Adagrad] - name: train.loss type: choice values: [bce, kl] - name: KvsAll.label_smoothing type: choice values: [0.0, 0.1, 0.2] - name: train.optimizer_args.lr type: range bounds: [0.0005, 0.5] # embedding dimension - name: gcn_embedder.num_layers type: choice values: [1,2,3] # embedding initialization - name: lookup_embedder.initialize type: choice values: [xaviernormal, xavieruniform, normal, uniform] - name: lookup_embedder.initializeargs.normal.mean type: fixed value: 0.0 - name: lookup_embedder.initializeargs.normal.std type: range bounds: [0.00001, 1.0] log_scale: True - name: lookup_embedder.initializeargs.uniform.a type: range bounds: [-1.0, -0.00001] - name: lookup_embedder.initialize_args.xavieruniform.gain type: fixed value: 1.0 - name: lookup_embedder.initialize_args.xaviernormal.gain type: fixed value: 1.0 # embedding regularization - name: lookup_embedder.regularize type: choice values: ['', 'lp'] - name: lookup_embedder.regularize_weight type: range bounds: [1.0e-20, 1.0e-01] log_scale: True # embedding initialization - name: gcn_embedder.initialize type: choice values: [xaviernormal, xavieruniform, normal, uniform] - name: gcn_embedder.initializeargs.normal.mean type: fixed value: 0.0 - name: gcn_embedder.initializeargs.normal.std type: range bounds: [0.00001, 1.0] log_scale: True - name: gcn_embedder.initializeargs.uniform.a type: range bounds: [-1.0, -0.00001] - name: gcn_embedder.initialize_args.xavieruniform.gain type: fixed value: 1.0 - name: gcn_embedder.initialize_args.xaviernormal.gain type: fixed value: 1.0 # embedding regularization - name: gcn_embedder.regularize type: choice values: ['', 'lp'] - name: gcn_embedder.regularize_weight type: range bounds: [1.0e-20, 1.0e-01] log_scale: True # embedding dropout - name: gcn_embedder.dropout type: range bounds: [-0.5, 0.5] - name: reciprocal_relations_model.base_model.entity_embedder.dim type: choice values: [128, 200, 256] Your help is very much appreciated! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#241>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEWXZBQBC3T6IL7WBUPYGDUJQ3ULANCNFSM5HBLKQXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
One Issue that I would find if I increase the search space is, there are so many combinations of each HP, and it's pretty likely that it's the combination that does not make sense. How do you deal with this without increasing the number of trials?
@jwzhi I noticed you're not using
log_scale: True
on thetrain.optimizer_args.lr
. This might also help.
Yes. That helps! Thank you
+1 good catch @BugsBuggy on the missing
log_scale: True
for the learning rate! This will lead to many very similar learning rates.+1 on all what Daniel said.
The lower boundary for the dropout should be 0.0, not -0.5.
No that is correct, this means that whenever dropout is < 0 it will be 0. This means that for half of all SOBOL dropout will be 0.
A few more comments/thoughts:
- I guess you have the same config for fb15k-237, because the one you posted is for Codex?
- I would also include the embedder activation function into the search space (you seem to have the compute budget).
Oh right. I have the same search space for FB15k and Codex. Sorry I posted the wrong scripts. Good points about the log_scale of lr. I am actually wondering why all of the learning rates are pretty large and it does not work well. :)
Indeed, now I see that it makes sense to set dropout to -0.5
@jwzhi I wonder how the tie_handling tolerances affect your results. Since I am dealing with tie-handling errors I experimentally used the same tolerances as you and the evaluation results were all pretty bad. So this might be an error source, too.
Not sure about the tie handling. Actually I turned the tolerance to be the default value for the current HP search. If you are using conve, take a look at if you have all of the config options set correctly. I noticed that one time it's wrong because of incorrect dimensions in conve.
Hi,
I tried to find the best hyperparameter using ax search. However, after searching > 150 trials on distmult, I am not getting very search results (I mean MRR_filtered_with_test > 0.3 on FB15k-237).
I also noticed that in your kge-iclr20 repo, https://github.com/uma-pi1/kge-iclr20, for each model-negativesampling-loss, you only tried 30 trials while search space is bigger than mine (for lr, I only search from 0.5 - 0.0005). I wonder if there is anything that I did wrong and made my HP search unsuccessful.
Here is one of my search script
Your help is very much appreciated!