memgraph / mage

MAGE - Memgraph Advanced Graph Extensions :crystal_ball:
Apache License 2.0
251 stars 25 forks source link

[BUG] Link prediction throws RuntimeError: DataLoader worker exited unexpectedly #434

Closed matea16 closed 7 months ago

matea16 commented 9 months ago

Memgraph version 2.13

Describe the bug A user reported on Discord an unexpected behavior when running the Link Prediction procedure.

To Reproduce In the Discord thread, there are complete datasets and queries provided. I've verified the issue and the same problem occurs. The datasets provided are too large to attach, but they can be found in the thread. The queries ran are:

STORAGE MODE IN_MEMORY_ANALYTICAL;

LOAD  CSV FROM "/fall_2022_nodes.csv" WITH HEADER AS row
CREATE (n:NetworkNode {
  address: row.address,
  tactics_src: row.tactics_src,
  tactics_dest: row.tactics_dest})
SET n.tactics_src = row.tactics_src
SET n.tactics_dest = row.tactics_dest;

CREATE INDEX ON :NetworkNode(address);

LOAD CSV FROM "/fall_2022_edges.csv" WITH HEADER AS row
MATCH (src:NetworkNode {address: row.src_address})
MATCH (dest:NetworkNode {address: row.dest_address})
CREATE (src)-[e:COMMUNICATES_WITH]->(dest)
SET e.tactic = row.label_tactic;

CALL degree_centrality.get("in") YIELD node, degree SET node.in_degree = degree; 
CALL degree_centrality.get("out") YIELD node, degree SET node.out_degree = degree;
CALL pagerank.get() YIELD node, rank SET node.rank = rank;
CALL betweenness_centrality.get() YIELD node, betweenness_centrality SET node.betweenness_centrality = betweenness_centrality;
CALL katz_centrality.get() YIELD node, rank SET node.katz_centrality = rank;

MATCH (node) SET node.features1 = [node.in_degree, node.out_degree, node.rank];
MATCH (node) SET node.features2 = [node.in_degree, node.out_degree, node.rank, node.katz_centrality, node.betweenness_centrality];

CALL link_prediction.set_model_parameters({
  num_epochs: 5,
  learning_rate: 0.01,
  split_ratio: 0.8,
  node_features_property: "features1",
  target_relation: "COMMUNICATES_WITH"
}) YIELD * RETURN *;

CALL link_prediction.train()
YIELD training_results, validation_results
RETURN training_results, validation_results;
Note that I was able to load the dataset because the node classification works fine (did not provide the queries here for node classification) and it's the link prediction that I have issue with

Screenshots Screenshot_2023-12-09_at_6 17 25_PM (1)

Additional context When using a smaller dataset and running the exact same queries, the procedure works as expected

andrejtonev commented 7 months ago

This seems to be a docker configuration problem. We are exhausting the shared memory and crashing.

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
andrejtonev commented 7 months ago

A good thread about this problem: https://github.com/ultralytics/yolov3/issues/283

andrejtonev commented 7 months ago

@kgolubic maybe we whould add a warning/note to the docs?