Closed ghost closed 9 years ago
Hmm, it's definitely possible that you'll need additional memory (we're still working on making SnapVX more memory-efficient...). I can try to run it on one of our large-memory machines to see how things look - is the dataset available anywhere? Otherwise I can just generate random data for a problem of that size
Thanks David. Here's the link for the data: https://drive.google.com/file/d/0B1M2lNNkWQxmQW1VYzVpbjU0NzQ/view?usp=sharing https://drive.google.com/file/d/0B1M2lNNkWQxma1gweDZ2RzZMZTQ/view?usp=sharing Here's the code I'm using: ''' from snapvx import *
def node_obj(data): x = Variable(1,name='x'); return power(norm(x - float(data[0])),2);
def laplace_reg(src, dst, data): return norm(src['x'] - dst['x']);
gvx = LoadEdgeList('edges_supervoxel.txt');
gvx.AddNodeObjectives('superVoxelMeans.csv', node_obj);
gvx.AddEdgeObjectives(laplace_reg);
gvx.Solve(useADMM=True); gvx.PrintSolution(); '''
Hmm, I was able to get your problem set up on our machines using ~2.5GB of memory. You said the segfault occurs after AddNodeObjectives, right? Were you able to get a smaller problem running on EC2, or did all SnapVX problems cause issues? I don't have any AWS experience, but I could learn and set something up if it appears like an EC2 problem.
I should also note that a second issue does pop up. In your graph, some nodes (for example node 10) have a degree of 0, so they are not instantiated when you load the graph from edges_supervoxel.txt. This is fine, but your csv file includes these nodes, so when you bulk load it gets offset (for example thinking that row 10 belongs to node 11 in the graph). Then, when it goes through all the nodes and still has data in the CSV file, it throws an error because it does not know what to do with these remaining rows. We have ways of specifying which nodes to bulk load, but we do not let the user specify which rows of the csv file to ignore (i.e. row 10). The easiest quick-fix is to delete the rows of nodes with no neighbors, and in the meantime we can look into easier ways of handling this through SnapVX
Sorry, it turns out that the CPU had only 1GB of memory which might have been too small for the amount of data I had.
On Thu, Oct 8, 2015 at 11:49 PM, davidhallac notifications@github.com wrote:
Hmm, I was able to get your problem set up on our machines using ~2.5GB of memory. You said the segfault occurs after AddNodeObjectives, right? Were you able to get a smaller problem running on EC2, or did all SnapVX problems cause issues? I don't have any AWS experience, but I could learn and set something up if it appears like an EC2 problem.
I should also note that a second issue does pop up. In your graph, some nodes (for example node 10) have a degree of 0, so they are not instantiated when you load the graph from edges_supervoxel.txt. This is fine, but your csv file includes these nodes, so when you bulk load it gets offset (for example thinking that row 10 belongs to node 11 in the graph). Then, when it goes through all the nodes and still has data in the CSV file, it throws an error because it does not know what to do with these remaining rows. We have ways of specifying which nodes to bulk load, but we do not let the user specify which rows of the csv file to ignore (i.e. row 10). The easiest quick-fix is to delete the rows of nodes with no neighbors, and in the meantime we can look into easier ways of handling this through SnapVX
— Reply to this email directly or view it on GitHub https://github.com/snap-stanford/snapvx/issues/5#issuecomment-146748480.
Yup, segfaulting wth only 1GB makes sense. You'd either need more memory or a smaller problem size. Thanks for the update!
Hello,
I'm trying to use snapvx on a t2.micro ec2 instance with 8Gb memory. I'm trying to define a network with 125145 nodes and 4129676 edges. The observed value for each node is a 3x1 vector. I'm getting segmentation fault error . Below is the code I'm using. Do you think I need more memory?
Thank you so much.
'''>>> from snapvx import *