yuikns / panther

Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention....
http://arxiv.org/abs/1504.02577
MIT License
28 stars 6 forks source link

Questions regarding Panther's usage #2

Open DonaldTsang opened 4 years ago

DonaldTsang commented 4 years ago
  1. What is the output file of Panther? We know the input to be a simple CSV.
  2. How many iterations does Panther uses for their system? It is not specified in the paper?
  3. Would increasing the iteration count of Panther make it more accurate?
  4. Is it possible to use this as a way of clustering user/vertex/node Roles?
yuikns commented 4 years ago

Apologize for the delayed response.

  1. The output was in $current_folder/result/<data, e.g. sampe>_T_D_<epsilon * 1000000, (to avoid period)>.pathsim. It is a record file. Please see the sample output below.
  2. We perform R random walks, which may the "number of iterations" in the issue. This is a calculated value.
  3. We provide theoretical proofs for the error-bound and confidence of the proposed algorithm. In general, the path similarity can be viewed as a probability of measure defined over all paths . Thus we can adopt the results from Vapnik-Chernovenkis (VC) learning theory to analyze the proposed sampling-based algorithm. Theoretically, we obtain that the sample size. Increasing the iteration count may not harvest much more accuracy, but highly harm the performance.
  4. It was trying to figure out D similar nodes based on the ego network. It could be a good way to provide some key informations for clustering. Actually this was part of our work in AMiner's author clustering.

Sample output:

0:0.000001629734520 505748:0.000000992012317 1:0.000000708580226 157140:0.000000637722204 264947:0.000000637722204 60303:0.000000566864181 956126:0.000000212574068 1098494:0.000000141716045 498405:0.000000141716045 43204:0.000000070858023 106492:0.000000070858023 265288:0.000000070858023 120664:0.000000070858023 129150:0.000000070858023 98430:0.000000070858023 36690:0.000000070858023 98429:0.000000070858023 206687:0.000000070858023 178984:0.000000070858023 81329:0.000000070858023 228866:0.000000070858023 258156:0.000000070858023 32979:0.000000070858023 175271:0.000000070858023 271976:0.000000070858023 338035:0.000000070858023 367442:0.000000070858023 375027:0.000000070858023 447600:0.000000070858023 456807:0.000000070858023 769223:0.000000070858023 172276:0.000000070858023 593231:0.000000070858023 572214:0.000000070858023 157139:0.000000070858023 863611:0.000000070858023 919730:0.000000070858023 45094:0.000000070858023 1034115:0.000000070858023 1075850:0.000000070858023 26354:0.000000070858023 1139662:0.000000070858023 1488646:0.000000070858023 1539641:0.000000070858023 26353:0.000000070858023

The above record was in the first line (id 0), and the D similar records are: 0, 505748, 1, 157140....

DonaldTsang commented 4 years ago

Would it be possible to provide pointers to Rolde Detection, Role Clustering, or Role Discovery? I would like to focus more on that front since that is what RoleSim is for.

yuikns commented 4 years ago

I am afraid this is not the problem that Panther was supposed to resolve.

DonaldTsang commented 4 years ago

If that is the case then do you know of any RoleSim replacements or alternatives? Would Vertex Similarity be similar to Link Prediction?