waynebhayes / SANA

Simulating Annealing Network Aligner
25 stars 40 forks source link

Patch to detect duplicate edges when loading a graph from edge list(.el) file #46

Closed idetatsu closed 6 years ago

idetatsu commented 6 years ago

This commit is to fix the unexpected runtime error due to duplicate edges in a edge list(.el) file.

Cause/Reason

Originally, when there is a duplicate edge, Graph.cpp:270 calls "g.getIndexToNodeNameMap()" to translate the index of the nodes to which the edge is incident into the original name of the nodes in order to display them along with an error message. However, "g.getIndexToNodeNameMap()" is not supposed to be called at this point in the program, it causes an unexpected behavior and crashes the program.

Fix

From Graph.cpp:248 to 285, I have added some lines to detect duplicate edges using a 2D map. Basically, adjMatrix[node1][node2] stores the last line number of the .el file in which an edge between node1 and node2 appears so that it can be queried to check if there is already an edge between node1 and node2, e.g., if adjMatrix[3][5] is 10, then in the line 10, there is an edge between the nodes, 3 and 5. The program constructs this map at the same time it reads edges from the file and if there's already a duplicate in the map, it raises an error with the line number and the name of the nodes.

Example Error Message

tatsuroi@woodhouse ~/w/SANA ❯❯❯ ./sana -fg1 networks/RNorvegicus18.el -fg2 networks/SPombe18.el                                                                                                                  V
woodhouse.ics.uci.edu
Thu Nov  1 17:31:08 PDT 2018
=== Parsed arguments ===
-combinedScoreAs: sum   -fg1: networks/RNorvegicus18.el -fg2: networks/SPombe18.el      -localScoresFile: sana  -method: sana   -mode: normal   -o: sana        -objfuntype: generic    -paretoThreads: 1       -qmode: normal     -tdecay: auto   -tinitial: auto -wavenodesim: nodec     -wecnodesim: graphletlgraal
-gofrac: 1      -iterperstep: 1e+07     -lgraaliter: 1000       -maxGraphletSize: 5     -nneighbors: 50 -ntabus: 300    -numcand: 3     -paretoCapacity: 200    -paretoInitial: 1       -paretoIterations: 10000  -qcount: 1       -s3: 1  -t: 5   -tcand: 1       -tfin: 3        -tnew: 3

-edgecweights: 0.1 0.25 0.5 0.15 
-esim: 
-goweights: 1 
-nodecweights: 0.1 0.25 0.5 0.15 
-simFormat: 
Seed: 526139060
Initializing graphs...
Loading graphs using Graph::loadGraph()
RNorvegicus18: number of nodes = 3569, number of edges = 4953
terminate called after throwing an instance of 'std::runtime_error'
  what():  duplicate edges not allowed in file
        'networks/RNorvegicus18.el:24' 10179 - 24253
        'networks/RNorvegicus18.el:93' 10179 - 24253

[1]    2786 abort (core dumped)  ./sana -fg1 networks/RNorvegicus18.el -fg2 networks/SPombe18.el
idetatsu commented 6 years ago

Here are the results for both before and after. I have run them a couple of times but the results are pretty stable. The maximum resident set size after the edit is always a bit higher than before.

Before

        Command being timed: "./sana -fg1 networks/SCerevisiae18.el -fg2 networks/HSapiens18.el -t 1"
        User time (seconds): 129.95
        System time (seconds): 9.24
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:19.53
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 319788
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 2477884
        Voluntary context switches: 84
        Involuntary context switches: 4412
        Swaps: 0
        File system inputs: 0
        File system outputs: 256
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

After

        Command being timed: "./sana -fg1 networks/SCerevisiae18.el -fg2 networks/HSapiens18.el -t 1"
        User time (seconds): 134.17
        System time (seconds): 8.65
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:23.00
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 322888
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 2443700
        Voluntary context switches: 93
        Involuntary context switches: 3224
        Swaps: 0
        File system inputs: 0
        File system outputs: 248
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

My concern is that this difference might grow along the number of edges in the file. Should I try running them with different networks?

waynebhayes commented 6 years ago

OK it's fine. It's a very minor increase in RAM (a few percent). That's fine, and those networks are pretty big. I thought the memory might double in size or something. It's fine, I'll accept the pull now.

-- Wayne Hayes, Ph.D. Associate Professor of Computer Science, University of California, Irvine Director, UCI-SDSU Joint Ph.D. Program in Computational Science (UCI side) The evidence is now incontrovertible that Russia interfered with the 2016 US Presidential election by subtly manipulating public opinion via fake social media accounts. Why? Follow the money: Trump has been laundering it for Russian billionaires, including Putin, for 20 years. Details and fact checking at https://secure.avaaz.org/campaign/en/shocking_truth_about_trump/?bebhylb https://secure.avaaz.org/campaign/en/shocking_truth_about_trump/?bebhylb. The opinions expressed in this email by Professor Hayes are his and his alone, and are not meant to represent those of the University of California.

On Thu, Nov 1, 2018 at 11:23 PM, Tatsuro Ide notifications@github.com wrote:

Here are the results for both before and after. I have run them a couple of times but the results are pretty stable. The maximum resident set size after the edit is always a bit higher than before. Before

    Command being timed: "./sana -fg1 networks/SCerevisiae18.el -fg2 networks/HSapiens18.el -t 1"
    User time (seconds): 129.95
    System time (seconds): 9.24
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 2:19.53
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 319788
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 2477884
    Voluntary context switches: 84
    Involuntary context switches: 4412
    Swaps: 0
    File system inputs: 0
    File system outputs: 256
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

After

    Command being timed: "./sana -fg1 networks/SCerevisiae18.el -fg2 networks/HSapiens18.el -t 1"
    User time (seconds): 134.17
    System time (seconds): 8.65
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 2:23.00
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 322888
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 2443700
    Voluntary context switches: 93
    Involuntary context switches: 3224
    Swaps: 0
    File system inputs: 0
    File system outputs: 248
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

My concern is that this difference might grow along the number of edges in the file. Should I try running them with different networks?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/waynebhayes/SANA/pull/46#issuecomment-435283238, or mute the thread https://github.com/notifications/unsubscribe-auth/ATUkjILYlzW3u4QJfY4m41olCDfsDjn0ks5uq-TMgaJpZM4YKYQV .