haitian-sun / GraftNet

BSD 2-Clause "Simplified" License
268 stars 56 forks source link

downsample #13

Closed fcc357 closed 3 years ago

fcc357 commented 4 years ago

Hello, Dr. Sun I would like to ask you how the data set is downsampled.I want to sample KB tuples down to 10%, 30%, 50%, 70%, 90% to simulate incomplete KB.But I don't know how to do a downsampling right now.We look forward to hearing from you.

haitian-sun commented 4 years ago

Hi,

We downsample the dataset linearly. Basically, for any fact, there’s a probability of p to drop it from the kB.

Please let me know if you have any question.

Thanks, Haitian

On May 3, 2020, at 12:02 AM, fcc357 notifications@github.com wrote:

 Hello, Dr. Sun I would like to ask you how the data set is downsampled.I want to sample KB tuples down to 10%, 30%, 50%, 70%, 90% to simulate incomplete KB.But I don't know how to do a downsampling right now.We look forward to hearing from you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

fcc357 commented 4 years ago

Thank you for your answer. Could you please send me the script of downsample? Thanks

haitian-sun commented 4 years ago

Sorry we don’t have it right now.

You can do:

with open(in_filename) as f_in, open(out_filename) as f_out: for line in f_in: If random.random() < p: f_out.write(line)

Hope this help.

Thanks, Haitian

On May 3, 2020, at 2:20 AM, fcc357 notifications@github.com wrote:

Thank you for your answer. Could you please send me the script of downsample? Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OceanskySun/GraftNet/issues/13#issuecomment-623061322, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE5XL4AQMY4EWBFVALQ5DTRPUEK5ANCNFSM4MX6RDWQ.

fcc357 commented 4 years ago

OK,Thank you for you help.

fcc357 commented 4 years ago

Hello, Dr. Sun I would like to ask you how to get other embedding files and txt files. Now,I have already generated webqsp_subgraphs.json file. And the file can be split into the test.json, dev.json and train.json. But I don't know how to get other embedding files and txt files right now. We look forward to hearing from you.

haitian-sun commented 4 years ago

Hi,

The _emb_100d files are generated from glove 100d embeddings. _kge_100d files are pretrained TransE graph embeddings. These are helpful for rare entities. I don’t think they are useful for all entities because GCN layers will end up getting contextualized embeddings that are sufficient for prediction.

Thanks, Haitian

On May 6, 2020, at 8:36 AM, fcc357 notifications@github.com wrote:

Hello, Dr. Sun I would like to ask you how to get other embedding files and txt files. Now,I have already generated webqsp_subgraphs.json file. And the file can be split into the test.json, dev.json and train.json. But I don't know how to get other embedding files and txt files right now. We look forward to hearing from you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OceanskySun/GraftNet/issues/13#issuecomment-624623075, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE5XL436VJREFFJ22KMT2DRQFKUDANCNFSM4MX6RDWQ.