thunlp / ConceptFlow

MIT License
122 stars 18 forks source link

Processed Data & Input to decoder #33

Closed KristenZHANG closed 3 years ago

KristenZHANG commented 3 years ago

Hi, thanks so much for your project and I have a question about the data:

DATA: I analyzed the testdata.json (traindata.json is too large) and found that:

 for each example data:     
    data["all_entities_one_hop"] contains 3 kinds of concept:
       (a) 1-hop concept in entity2id (valid 1-hop)
       (b) 0-hop concept in entity2id (valid 0-hop)
       (c) concept not in entity2id (invalid)

Here invalid refers to concept not in "entity.txt" but in _"resource.txt" ("cskentities") so that the entity2id (constructed from entity.txt and relation.txt) does not has such concept names.

I guess the len(data["one_two_triple "]) is supposed to be equal to len(1-hop concept), but found that

len(data["one_two_triple "]) = len(data["all_entities_one_hop"]) - invalid_concept_num (c), 

But there are some 0-hop concepts in data["all_entities_one_hop"] (b), which I guess also 0-hop triple list in data["one_two_triple "].

Therefore, I am wondering did I misunderstand or miscompute during the analysis process?

Thanks so much!

KristenZHANG commented 3 years ago

Solved, thanks!