pykeen / pykeen

🤖 A Python library for learning and evaluating knowledge graph embeddings
https://pykeen.readthedocs.io/en/stable/
MIT License
1.64k stars 186 forks source link

How to initialize the embedding? #328

Closed ziwli closed 3 years ago

ziwli commented 3 years ago

Dear Sir,

I am using pipeline() to train the model, but I want to initialize the embedding setting eg. initilizer and dimension, how could I do?

cthoyt commented 3 years ago

Both of these can be set with the model_kwargs keyword to the pipeline() function. For example, try:

from pykeen.pipeline import pipeline

result = pipeline(
    dataset='Nations',
    model='TransE',
    model_kwargs={
        # By default, this is xavier_normal_, but for purposes of demo,
        #  this shows a different value
        'entity_initializer': 'normal',
        # Each model has its own reasonable default for embedding dim.
        # Over 1000 probably will give you diminishing returns and if you want to find
        # the best, try the HPO pipeline
        'embedding_dim': 128,
    },
    training_kwargs=dict(num_epochs=5),
)

Notice in the example I used a string, which can make configuring PyKEEN much easier. You can also specify your own functions like pytorch.nn.init.normal_. Our lookup list is here:

https://github.com/pykeen/pykeen/blob/42af73442ca000b410fe1a582d214246096d6eb5/src/pykeen/nn/emb.py#L405-L423

cthoyt commented 3 years ago

I will update the "First steps" docs since this is likely not obvious to first time users

ziwli commented 3 years ago

Dear Sir,

  I also want to ask how to check the list of the model_kwargs and

other xx_kwargs. I could find the model parameters sometimes, but hard to find all the XX_kwargs names and values. Could you tell me where to find?

On Tue, 2 Mar 2021 at 22:18, Charles Tapley Hoyt notifications@github.com wrote:

Both of these can be set with the model_kwargs keyword to the pipeline() function. For example, try:

from pykeen.pipeline import pipeline result = pipeline( dataset='Nations', model='TransE', model_kwargs={

By default, this is xaviernormal, but for purposes of demo,

    #  this shows a different value
    'entity_initializer': 'normal',
    'embedding_dim': 128,
},
training_kwargs=dict(num_epochs=5),

)

Notice in the example I used a string, which can make configuring PyKEEN much easier. You can also specify your own functions like pytorch.nn.init.normal_. Our lookup list is here:

https://github.com/pykeen/pykeen/blob/42af73442ca000b410fe1a582d214246096d6eb5/src/pykeen/nn/emb.py#L405-L423

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pykeen/pykeen/issues/328#issuecomment-789222818, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQ6JMVVIC2E2BVOH4ZDR23TBVI2DANCNFSM4YPXPSXA .

cthoyt commented 3 years ago

@ziwli every model defines its own kwargs, so you could check the page for the model you're interested in. Whatever goes in model_kwargs will get put there on instantiation inside the pipeline. Same goes for losses, training loops, regularizers, evaluators, etc.

ziwli commented 3 years ago

Dear Sir,

  I am very sorry to disturb you again, I don't know what is the

problem here. The error I guess is that the model name is not correct? But I really check the class that it is right name. The picture is in attachment.

On Tue, 2 Mar 2021 at 22:27, Charles Tapley Hoyt notifications@github.com wrote:

@ziwli https://github.com/ziwli every model defines its own kwargs, so you could check the page for the model you're interested in. Whatever goes in model_kwargs will get put there on instantiation inside the pipeline. Same goes for losses, training loops, regularizers, evaluators, etc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pykeen/pykeen/issues/328#issuecomment-789228311, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQ6JMQZJYVFSWLHKOBAQMTTBVJ4LANCNFSM4YPXPSXA .

cthoyt commented 3 years ago

@ziwli it's no problem, I'm happy to help. However, I can't see the image you've sent. Perhaps if you've got a traceback, you can copy it into the issue box.

Btw, are you in Mannheim? Part of the PyKEEN team is in Bonn and others in Munich!

ziwli commented 3 years ago

Dear Sir,

yes, I am in Mannheim. My thesis is about KGE so I need to use your framework to train the model, hahhaha. Sorry I just touch it, I am very new to this.

Traceback (most recent call last):
  File
"/Users/ziweili/PycharmProjects/masterthesis/pykeen/try/train_model.py",
line 3, in <module>
    pipeline_result = pipeline(
  File
"/Users/ziweili/PycharmProjects/masterthesis/pykeen/src/pykeen/pipeline.py",
line 927, in pipeline
    model_instance: Model = model(
  File
"/Users/ziweili/PycharmProjects/masterthesis/pykeen/src/pykeen/models/base.py",
line 993, in _new_init
    self.reset_parameters_()
  File
"/Users/ziweili/PycharmProjects/masterthesis/pykeen/src/pykeen/models/base.py",
line 149, in reset_parameters_
    self._reset_parameters_()
  File
"/Users/ziweili/PycharmProjects/masterthesis/pykeen/src/pykeen/models/base.py",
line 959, in _reset_parameters_
    self.entity_embeddings.reset_parameters()
  File
"/Users/ziweili/PycharmProjects/masterthesis/pykeen/src/pykeen/nn/emb.py",
line 160, in reset_parameters
    self._embeddings.weight.data =
self.initializer(self._embeddings.weight.data)
TypeError: 'str' object is not callable

what my code is

from pykeen.pipeline import pipeline

pipeline_result = pipeline(
    dataset='Nations',
    model='ComplEx',
    model_kwargs={
        'entity_initializer': 'normal',
        'embedding_dim': 50,
    },
    optimizer='Adagrad',
    optimizer_kwargs=dict(
        lr= 0.2
    ),
    regularizer='LpRegularizer',
    regularizer_kwargs=dict(
        weight=8.0e-08

    ),
    loss='CrossEntropyLoss',
    training_loop='sLCWA',
    negative_sampler='basic',
    training_kwargs=dict(
        num_epochs=20,  # more epochs than before
        batch_size=100,
        checkpoint_name='checkpoint_pykeen.pt',
        checkpoint_frequency=5,
        checkpoint_directory='./pykeen/try'
    ),
    evaluator='RankBasedEvaluator',
    stopper='early',
    stopper_kwargs=dict(
        evaluation_batch_size=100,
        patience=5
    ),
)

pipeline_result.save_to_directory('nations_complex')

I think I am right, I have tried a very simple example, it works. Then I want to add more experiment setting, not works.

On Tue, 2 Mar 2021 at 23:18, Charles Tapley Hoyt notifications@github.com wrote:

@ziwli https://github.com/ziwli it's no problem, I'm happy to help. However, I can't see the image you've sent. Perhaps if you've got a traceback, you can copy it into the issue box.

Btw, are you in Mannheim? Part of the PyKEEN team is in Bonn and others in Munich!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pykeen/pykeen/issues/328#issuecomment-789260543, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQ6JMTHUMEQAJNYDG5QA23TBVP3DANCNFSM4YPXPSXA .

cthoyt commented 3 years ago

I'm not able to reproduce the issue you described using this code. Would you please try running pip install --upgrade pykeen then also reporting your version number with the output of pykeen version?

ziwli commented 3 years ago

Dear Sir,

    I want to ask where can I find the codes I stored in the

checkpoint? I have checked that you stored the epoch, model, optimizer, random_seed and so on. I also see the codes of pipeline.py, just see the checkpoint_dict['random']. But could I define what to find by myself?

On Wed, 3 Mar 2021 at 13:20, Charles Tapley Hoyt notifications@github.com wrote:

I'm not able to reproduce the issue you described using this code. Would you please try running pip install --upgrade pykeen then also reporting your version number with the output of pykeen version?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pykeen/pykeen/issues/328#issuecomment-789675331, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQ6JMW5YZ63AOWL7HY3GDDTBYSRLANCNFSM4YPXPSXA .

cthoyt commented 3 years ago

hi @ziwli, the conversation on this issue is getting a bit off topic. If you've got more questions, would you please open separate issues so we can keep the answers and any code needed to support them as organized as possible? Thanks :)

ziwli commented 3 years ago

Dear Sir,

   of course, sorry that. I will put my questions in the issue part.

On Thu, 4 Mar 2021 at 16:30, Charles Tapley Hoyt notifications@github.com wrote:

hi @ziwli https://github.com/ziwli, the conversation on this issue is getting a bit off topic. If you've got more questions, would you please open separate issues so we can keep the answers and any code needed to support them as organized as possible? Thanks :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pykeen/pykeen/issues/328#issuecomment-790701882, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQ6JMV5QHHR3UIE6WUK53TTB6RRXANCNFSM4YPXPSXA .

cthoyt commented 3 years ago

Okay I’m going to close this one because I think the original question was answered. Let me know if that’s not the case