Closed tiedaoxiaotubie closed 3 years ago
Hi @tiedaoxiaotubie ,
The NeighborhoodSubgraphPairwiseDistance kernel can take both node and edge labels into account. You can have a look at this example that uses this kernel: https://github.com/ysig/GraKeL/blob/master/tutorials/digit_classification/digit_classification.ipynb
Note that the node and edge labels need to be discrete. You mentioned that you use execution time as edge label. Does that mean that these labels are real-valued? If this is the case, the kernel would not be able to handle them. One easy workaround is to apply some binning approach and map these execution times to integers.
Thanks for your answer, that's quite helpful! But I still have 3 questions: 1) what's the different between label and attribute? it seems attribute must be a vector 2) In the link you sent to me, I noticed weisfeiler_lehman doesn't ignore the edge label, but depends on the base_kernel, does it means we can still use weisfeiler_lehman in our situation? 3) how to transform NetworkX edge labeled graphs to GraKeL graphs? In example code, only shows the node labeled situation, I use the following instruction:
G = graph_from_networkx(cfg_all, node_labels_tag='label', edge_labels_tag='execute_counts')
am I right?
Neat! I just tried Weisfeiler-Lehman to calculate the similarity, I found no matter I label edge or not, the graph similarity are always same! That's weird.
Besides, I though the the number of iterations in Weisfeiler-Lehman will affect the result monotonic, but when I add the iterations, the precision didn't rise, so the relationship between iterations number and graph similarity is not monotonic?
The kernel values are supposed to increase as the number of iterations increase (if not normalized). But it was not clear to me if you exactly meant that. Or do you mean the accuracy (or some other evaluation metric) on the test set of some dataset? Because these two are not related to each other.
Oh, I forgot to mention that, in fact the data are normalized when I used :
gk = WeisfeilerLehman(n_iter=1, base_graph_kernel=VertexHistogram, normalize=True)
suppose you have several graphs that are very same, after increasing the number of iterations, will the graph similar metrics become more and more high? for example, the graph similarity may be 70% when you iterate 4 times, if you increate the iterate number from 4 to 10, the graph similarity will always higher than 70%?
Hello, I now have a question similar to yours: based on the use of Weisfeiler-Lehman to calculate the similarity of graph structure, I want to use it to calculate the similarity of graphs with weights (edges of the graph). I don't know if it can be implemented directly using GraKel, or can it be changed on this basis?
Hi @17853535785 ,
This has been discussed in a previous thread: https://github.com/ysig/GraKeL/issues/37
Dear @tiedaoxiaotubie
With regards to your last question, by increasing the number of iterations of the Weisfeiler-Lehman subtree kernel, it is not necessary that the normalized kernel values will keep increasing. Here is a counter-example: suppose we are given a graph G1 that corresponds to the path P5 (path consisting of 5 vertices) and a graph G2 that is the disjoint union of a the cycle C3 (triangle) and P2 (path consisting of 2 vertices). We annotate the vertices of both graphs with the same label (e.g., 1). After the first iteration of the Weisfeiler-Lehman subtree kernel, the label sets of the two graphs are identical, thus the normalized kernel value is equal to 1. At the end of the second interation, the label sets of the two graphs are not identical, thus the normalized kernel value will be smaller than 1.
Hi, I want to use GraKel to compare program's function call graph similarity, each caller address is a node, I use the caller address itself as the node's label, besides, loop always exist in program execution, thus one edge may be executed multiple times before the program exit, so I use execution time as edge label.
I want to know whether there is an algorithm in GraKel can handle the situation above? I checked the example, but it only shows the graph with labeled node, I also tried to use WeisfeilerLehman algorithm, but it seems this algorithm won't take edge label into consideration :(