twitter / cassovary

Cassovary is a simple big graph processing library for the JVM
http://twitter.com/cassovary
Apache License 2.0
1.05k stars 150 forks source link

Memory mapped graph #201

Closed plofgren closed 9 years ago

plofgren commented 9 years ago

This is a graph that stores its data in a memory mapped file. This enables two nice features: 1) No object overhead. The memory used for n nodes and m edges with both in-neighbor and out-neighbor access is exactly 8 + 16_n + 8_m bytes 2) Nearly instant graph loading. The twitter-2010 graph with 1.5 billion edges loads in seconds if the binary graph file is already in OS cache, and otherwise loads in the time it takes the OS to transfer the file from disk to RAM.

@pankajgupta

pankajgupta commented 9 years ago

Perhaps @szymonm might also be interested in this PR.

szymonm commented 9 years ago

nice!

plofgren commented 9 years ago

Thanks for the feedback. I think I've responded to everything except for a few questions I wrote comments on:

  1. Motivation of 8-byte reserved area.
  2. Style rule for override. I went ahead and removed override, but I'm still curious what the rational is for removing it.
  3. Keeping commented out def maxNodeId, with the intention of a follow-up PR.
pankajgupta commented 9 years ago

Re override, simple rule I follow (but not everyone does) is to not have this keyword when not needed. Plus, the compiler will complain in case you didn't use override but really needed to -- that way you could discover that the compiler is sequencing hierarchy of classes a certain way.

plofgren commented 9 years ago

OK, I've removed the overrides, changed the reserved bytes and nodeCount to 8 bytes, and realized that I can override maxNodeId by making it a lazy val. @pankajgupta I think I've addressed everything--is this ready to merge? Thanks for the comments!

pankajgupta commented 9 years ago

yep, please do a squash commit and I will merge. thanks!