snap-stanford / snapvx

Other
65 stars 34 forks source link

Segmentation fault #5

Closed ghost closed 9 years ago

ghost commented 9 years ago

Hello,

I'm trying to use snapvx on a t2.micro ec2 instance with 8Gb memory. I'm trying to define a network with 125145 nodes and 4129676 edges. The observed value for each node is a 3x1 vector. I'm getting segmentation fault error . Below is the code I'm using. Do you think I need more memory?

Thank you so much.

'''>>> from snapvx import *

def node_obj(data): ... x = Variable(1,name='x'); ... return power(norm(x - float(data[0])),2); ... def laplace_reg(src, dst, data): ... return norm(src['x'] - dst['x']); ... gvx = LoadEdgeList('edges_supervoxel.txt'); gvx.AddNodeObjectives('superVoxelMeans.csv', node_obj); Segmentation fault (core dumped)'''

davidhallac commented 9 years ago

Hmm, it's definitely possible that you'll need additional memory (we're still working on making SnapVX more memory-efficient...). I can try to run it on one of our large-memory machines to see how things look - is the dataset available anywhere? Otherwise I can just generate random data for a problem of that size

ghost commented 9 years ago

Thanks David. Here's the link for the data: https://drive.google.com/file/d/0B1M2lNNkWQxmQW1VYzVpbjU0NzQ/view?usp=sharing https://drive.google.com/file/d/0B1M2lNNkWQxma1gweDZ2RzZMZTQ/view?usp=sharing Here's the code I'm using: ''' from snapvx import *

Helper function for node objective

Takes in a row from the CSV file, returns an optimization problem

def node_obj(data): x = Variable(1,name='x'); return power(norm(x - float(data[0])),2);

Helper function for edge objective

def laplace_reg(src, dst, data): return norm(src['x'] - dst['x']);

Load in Edge List to build graph with default node/edge objectives

gvx = LoadEdgeList('edges_supervoxel.txt');

Bulk Load node objectives:

Takes one row of the CSV, uses that as input to node_obj

There is also an (optional) input of specifying which nodes each row of the CSV refers to

gvx.AddNodeObjectives('superVoxelMeans.csv', node_obj);

Bulk load edge objectives for all edges

gvx.AddEdgeObjectives(laplace_reg);

gvx.Solve(useADMM=True); gvx.PrintSolution(); '''

davidhallac commented 9 years ago

Hmm, I was able to get your problem set up on our machines using ~2.5GB of memory. You said the segfault occurs after AddNodeObjectives, right? Were you able to get a smaller problem running on EC2, or did all SnapVX problems cause issues? I don't have any AWS experience, but I could learn and set something up if it appears like an EC2 problem.

I should also note that a second issue does pop up. In your graph, some nodes (for example node 10) have a degree of 0, so they are not instantiated when you load the graph from edges_supervoxel.txt. This is fine, but your csv file includes these nodes, so when you bulk load it gets offset (for example thinking that row 10 belongs to node 11 in the graph). Then, when it goes through all the nodes and still has data in the CSV file, it throws an error because it does not know what to do with these remaining rows. We have ways of specifying which nodes to bulk load, but we do not let the user specify which rows of the csv file to ignore (i.e. row 10). The easiest quick-fix is to delete the rows of nodes with no neighbors, and in the meantime we can look into easier ways of handling this through SnapVX

ghost commented 9 years ago

Sorry, it turns out that the CPU had only 1GB of memory which might have been too small for the amount of data I had. ​

On Thu, Oct 8, 2015 at 11:49 PM, davidhallac notifications@github.com wrote:

Hmm, I was able to get your problem set up on our machines using ~2.5GB of memory. You said the segfault occurs after AddNodeObjectives, right? Were you able to get a smaller problem running on EC2, or did all SnapVX problems cause issues? I don't have any AWS experience, but I could learn and set something up if it appears like an EC2 problem.

I should also note that a second issue does pop up. In your graph, some nodes (for example node 10) have a degree of 0, so they are not instantiated when you load the graph from edges_supervoxel.txt. This is fine, but your csv file includes these nodes, so when you bulk load it gets offset (for example thinking that row 10 belongs to node 11 in the graph). Then, when it goes through all the nodes and still has data in the CSV file, it throws an error because it does not know what to do with these remaining rows. We have ways of specifying which nodes to bulk load, but we do not let the user specify which rows of the csv file to ignore (i.e. row 10). The easiest quick-fix is to delete the rows of nodes with no neighbors, and in the meantime we can look into easier ways of handling this through SnapVX

— Reply to this email directly or view it on GitHub https://github.com/snap-stanford/snapvx/issues/5#issuecomment-146748480.

davidhallac commented 9 years ago

Yup, segfaulting wth only 1GB makes sense. You'd either need more memory or a smaller problem size. Thanks for the update!