microsoft / GraphEngine

Microsoft Graph Engine
http://www.graphengine.io/
MIT License
2.2k stars 329 forks source link

Question: Is there a way to use String as the unique Cell identifier? #111

Closed bingzhangdai closed 7 years ago

bingzhangdai commented 7 years ago

Suppose I have a cell with unique "name" (type String) filed. In LIKQ, calling KnowledgeGraph.StartFrom(), it is weird to always pass in the ID of the node. Since we usually know the name of the node, it is not straight forward to get the ID in advance. However, traversing all the node to match the "name" filed is too inefficient.

Is there a way to use the "name" field as unique identifier? Or can I get the CellID of the node directly from the "name" field (for example, can I call GetHashCode)?

yatli commented 7 years ago

Yes! You'll just have to roll your own index service. Please take a look at our sample code freebase-likq. We've implemented a very simple SQLite-based indexer there, so that you can post not only cell ids, but also query objects containing name or fuzzy_name queries.

bingzhangdai commented 7 years ago

Thanks for your reply. Using an independent index service is indeed a solution. You can close this issue. However, what I am confused is your video tutorial from Zhihu zhuanlan . At 29 minutes of the video the slide reads

long user_id = tweet_cell.user.GetHashCode();
using (var user = Global.LocalStorage.UseUser(user_id))
...

Why can you use GetHashCode to obtain the CellId? I have imagined if there exits some way to use string as unique identifiers. (Now, I know there is not. Maybe you write the code just because it is a demo, not that strict.)

yatli commented 7 years ago

@bingzhangdai yep that's for demonstration purpose :p but actually this is a feasible approach, as long as you have a stronger hashing function (GetHashCode is 32bit and weak), and a backup mechanism to handle collisions.

yatli commented 7 years ago

Closing this issue now, feel free to add more bits here if you've got further questions. :)