tomgrek / zincbase

A batteries-included kit for knowledge graphs
MIT License
282 stars 38 forks source link

Canada located in Central Asia #14

Open tactycHQ opened 4 years ago

tactycHQ commented 4 years ago

Really cool project and loved your podcast!

I was tinkering around with the get_most_likely function as follows:

    kb.from_csv('.//assets//countries_s3_train.csv',delimiter='\t')
    kb.build_kg_model(cuda=False, embedding_size=40)
    kb.train_kg_model(steps=2000, batch_size=1, verbose=True)
    print(kb.get_most_likely('canada', 'locatedin', '?'))

And this results in: [{'triple': ('canada', 'locatedin', 'central_asia'), 'prob': 0.8538}]

Unless I have my geography wrong :), do you think this is a result of the data being faulty? Or could I have done something wrong?

JulioBarros commented 4 years ago

Thats funny... I get a different answer like africa, and south_america ... etc every time I train. The only fact in the file on canada is that it is is a neighbor of united_states ... and united_states has neighbors canada and mexico but no info on where the US is located. .... some countries are neighbors of other countries that end up being located in south_america so I could see the system saying there is a chance Candada is in south_america ... but I never get to that even after retraining.

Still this is all very interesting and look forward to learning more and learning how to analyze/diagnose the predictions.