Closed LuMflowers closed 1 year ago
Hi, do you mean the metapaths in Table 2? Since the confidence estimation is based on sampling, the confidences might vary slightly. Some relations are reflexive, e.g., CrC means the same as _CrC, and there are different formulations of one rule, e.g., ["CtD", "CrC", "CtD"] represents the same rule as ["CtD", "_CrC", "CtD"]. In such cases, we estimate the confidence for each representation separately and take the average. All equivalent formulations are also included in datasets/Hetionet/rules.txt.
We also estimate the confidence of both the rule and the inverse of the rule, e.g., ["Compound", "CpD", "Disease"] and ["Disease", "_CpD", "Compound"] and take the average to get a better estimate.
The script for calculating the confidences can be found here: https://github.com/liu-yushan/PoLo/tree/main/datasets/Hetionet/preprocessing.
The script for calculating the confidences can be found here: https://github.com/liu-yushan/PoLo/tree/main/datasets/Hetionet/preprocessing.
Thank you for your code. But I still have a question about the files in ../datasets/Hetionet/preprocessing/, how can I get these files such as node_edges.json and metapath_p3.json.
node_edges.json is a dictionary that lists all nodes' neighbors, grouped by relations and node types. You can just go through all triples and add the corresponding information to the dictionary. metapath_p3.json lists all possible metapaths between compounds and diseases up to length 3. This is based on the metagraph (see https://het.io/about/).
Hi, could you share the code of calculating the rule confidence of the Metapaths? I didn't get the right rule confidence according to your desciption.