thunlp / ConceptFlow

MIT License
122 stars 18 forks source link

inference.py not runnable #32

Closed KristenZHANG closed 3 years ago

KristenZHANG commented 3 years ago

Hi, I want to seek some help of the inference part. I found that the use of preprocessing.py in train.py and inference.py have some differences and the inference part is not directly runnable.

For example, (1) the "gen_batched_data" in preprocess.py cannot be directly imported into inference.py image (2) the "prepare_data" in preprocess.py only return single variable - "raw_vocab" image but in inference.py it needs 3 return values image

Thanks so much for your help!

HouyuZhang1007 commented 3 years ago

Hi, thanks for your interest! Sorry for the confusion, I think this is because the latest merge has some incompatible issues. Can you try to use the commit from d7e57ac5ec54159c286309cd0238360699171da4 and see if it works? Will update the mainline when I have chance.

KristenZHANG commented 3 years ago

Hi Houyu,

Thanks so much for your quick reply and help. The problem is solved using the commit version you provided.

BTW, I am wondering may I ask about the "one_two_triple " in the data?

As I tried to understand the processed Reddit data, initially I thought for each triple list in "one_two_triple " for example, A = [triple1, triple2, ..., triplen] in one_two_triple = [[triple1, triple2, ..., triplen] (A), [triple1, ...], ...], all triple in A has the same hop-1 entity, no matter the entity appears at the head or tail and the other entity is hop-2 concept. So I thought len(one_two_triple) should be equal to the num of 1 hop entity for a given example. However, I found that len(all_entities_one_hop) is not equal to len(one_two_triple), usually len(all_entities_one_hop) > len(one_two_triple) if I remember it clearlly.

Therefore I am wondering may I ask how did you construct the "one_two_triple ". It would be great helpful. Thanks a lot!

HouyuZhang1007 commented 3 years ago

Hi,

Your understanding for "one_two_triple" is correct. However, the "one-hop" in "all_entities_one_hop" means the one-hop sub-graph, so it also includes zero-hops, and that's why it always len(all_entities_one_hop) > len(one_two_triple).

KristenZHANG commented 3 years ago

Hi Houyu,

Thanks so much for your help!