Parallel code not working

monarch-initiative / embiggen

🍇 Embiggen is the Python Graph Representation learning, Prediction and Evaluation submodule of the GRAPE library.

BSD 3-Clause "New" or "Revised" License

41 stars 12 forks source link

Parallel code not working #17

Closed pnrobinson closed 4 years ago

pnrobinson commented 4 years ago

This is the new code

for [orig_node, alias_node] in pool.map(self._get_alias_node, g.nodes()):
     alias_nodes[orig_node] = alias_node
     dateTimeObj = datetime.now()
     print(dateTimeObj)
      print("Processed %d nodes" % len(alias_node))
pool.close() # added just now but does not help

However, when using a 'realistic graph with 20,000 nodes and 400,000 edges, the code gets stuck here. It outputs the following :

(...19,996 other times)
2020-01-17 14:19:20.817591
Processed 2 nodes
2020-01-17 14:19:20.817595
Processed 2 nodes
2020-01-17 14:19:20.817600
Processed 2 nodes

htop shows that 8 processors are working full steam. However, python appears to create 20,000 processes, which surely is not the best way of doing things. I am not sure when this will be finished but I let this go for 3 hours yesterday and did not get past this point.

pnrobinson commented 4 years ago

This may have been premature -- it did run now and seems ok. Can we add some print/log statements to show progress?

justaddcoffee commented 4 years ago

Sure, no problem. Could you show me the command you are running to test this?

pnrobinson commented 4 years ago

In the file "generate_rand_graph.py", I did this

num_nodes =  20000
num_edges = 4000000
max_weight = 100
# graph = nx.complete_graph(num_nodes)  !!!! Otherwise networkx takes a long time and we do not need this

and the in the file "runDiseaseGeneEmbedding.py I made this change training_file = 'tests/data/rand_20000nodes_400000edges.graph'

It does seem there is a pretty big speedup. tensorflow is now using only one cpu for training -- perhaps we can also do something here. I am now trying to figure out how to make the code more idiomatic by creating custom loss classes in keras etc. Any help/input welcome -- it is not superwell documented but the tensorflow pages are a good start, and pair/group programming would be valuable. The latter will make it easier to test the parameter space for node2vec.

justaddcoffee commented 4 years ago

Sure, glad that parallelization is working, and glad to pair programming whenever it's convenient. Possibly Monday or Tuesday? Or could find time right now if you are around.

Without any objection, I'll close this issue, since the parallelization seems to be working...