hpcaitech / ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI
Apache License 2.0
334 stars 102 forks source link

knowledge graph embedding examples #42

Closed MiracleDesigner closed 2 years ago

MiracleDesigner commented 2 years ago

Add three knowledge graph embedding examples DistMult, ComplEx and RotatE.

FrankLeeeee commented 2 years ago

Hi @MiracleDesigner , as I mentioned in the previous PR, can you show some experiment logs and plots to verify that the example indeed is correct in terms of model convergence? You can paste the log or plots in this conversation (do not put it in the code).

MiracleDesigner commented 2 years ago

Hi @FrankLeeeeeļ¼ŒThanks for your contribution. The following are the logs of the three methods on the FB15k-237 dataset:

Rotate: Training average positive_sample_loss at step 0: 3.182809 Training average negative_sample_loss at step 0: 0.052633 Training average loss at step 0: 1.617721 ... Training average positive_sample_loss at step 99900: 0.051416 Training average negative_sample_loss at step 99900: 0.046308 Training average loss at step 99900: 0.048862

Training average positive_sample_loss at step 100000: 0.051659 Training average negative_sample_loss at step 100000: 0.046598 Training average loss at step 100000: 0.049129

DistMult: Training average positive_sample_loss at step 0: 0.693146 Training average negative_sample_loss at step 0: 0.693147 Training average loss at step 0: 0.693147 ... Training average positive_sample_loss at step 99800: 0.025751 Training average negative_sample_loss at step 99800: 0.045992 Training average loss at step 99800: 0.035871

Training average positive_sample_loss at step 99900: 0.025627 Training average negative_sample_loss at step 99900: 0.046422 Training average loss at step 99900: 0.036025

ComplEx: Training average positive_sample_loss at step 0: 0.693150 Training average negative_sample_loss at step 0: 0.693147 Training average loss at step 0: 0.693149 ... Training average positive_sample_loss at step 99800: 0.006557 Training average negative_sample_loss at step 99800: 0.014454 Training average loss at step 99800: 0.010505

Training average positive_sample_loss at step 99900: 0.006531 Training average negative_sample_loss at step 99900: 0.013548 Training average loss at step 99900: 0.010040

Each model can converge after training. In addition, I can provide logs for all datasets if needed.

FrankLeeeee commented 2 years ago

Sorry for my late reply. Can you show some logs of the evaluation metrics?

MiracleDesigner commented 2 years ago

Hi @FrankLeeeee yes, The following are the evaluation logs of the three methods on the FB15k-237 dataset:

RotatE: Valid MRR at step 100000: 0.328876 Valid MR at step 100000: 219.944597 Valid HITS@1 at step 100000: 0.236641 Valid HITS@3 at step 100000: 0.363929 Valid HITS@10 at step 100000: 0.514229

DistMult: Valid MRR at step 99999: 0.174493 Valid MR at step 99999: 562.998318 Valid HITS@1 at step 99999: 0.100143 Valid HITS@3 at step 99999: 0.188138 Valid HITS@10 at step 99999: 0.331223

ComplEx: Valid MRR at step 99999: 0.119286 Valid MR at step 99999: 838.559224 Valid HITS@1 at step 99999: 0.059994 Valid HITS@3 at step 99999: 0.122840 Valid HITS@10 at step 99999: 0.240690

For DistMult and ComplEx, I used the same default parameters for training, so the results are not particularly good. But after careful parameter adjustment, the results in the original paper can be achieved.

FrankLeeeee commented 2 years ago

Great, thanks for your contribution!