twitter / cassovary

Cassovary is a simple big graph processing library for the JVM
http://twitter.com/cassovary
Apache License 2.0
1.05k stars 149 forks source link

#200 Feature/fastseq #211

Closed szymonm closed 8 years ago

szymonm commented 8 years ago

Hi @pankajgupta,

implemented specialized CSeq as we discussed. Can you have a look at it?

I will be working on additional tests for CSeq and some documentation in the next days.

I decided to force use to import implicits manually, because I had some troubles with importing them only in factory creating CSeq. It turned out that Scala didn't use specialized versions of my collections. To avoid this kind of bugs for CSeq I'm not providing CSeqFactory[Any]. I don't think we should need one as we want CSeq be used only for Int and Long for now.

I also wanted to keep CSeq interface minimal but needed slice for efficiency and type consistency of methods such as Node.inboundNodes(max: Int). However, I also consider a breaking change of deleting all the methods with max number of elements retrieved since a user can use .toSeq.take(max).

I also need to write some benchmark to check the result of this change on the performance.

Resolves #200

szymonm commented 8 years ago

Hey @pankajgupta, ball in your court here.

pankajgupta commented 8 years ago

yes, pardon the delay. getting to it.

On Sun, Feb 7, 2016 at 12:31 PM, Szymon notifications@github.com wrote:

Hey @pankajgupta https://github.com/pankajgupta, ball in your court here.

— Reply to this email directly or view it on GitHub https://github.com/twitter/cassovary/pull/211#issuecomment-181104337.

szymonm commented 8 years ago

looks like 2.10 compatibility is a bit hard...

pankajgupta commented 8 years ago

We can sunset 2.10

On Saturday, April 16, 2016, Szymon notifications@github.com wrote:

looks like 2.10 compatibility is a bit hard...

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/twitter/cassovary/pull/211#issuecomment-210878513

szymonm commented 8 years ago

Managed to make it 2.10 compliant.

Your turn, @pankajgupta

szymonm commented 8 years ago

Checked running time on global PageRank benchmark and it is the same.

pankajgupta commented 8 years ago

Cool. Is there any benchmark which highlights the benefit of the change -- i.e., the original motivation of doing it?

On Sun, Apr 17, 2016 at 1:16 PM, Szymon notifications@github.com wrote:

Checked running time on global PageRank benchmark and it is the same.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/twitter/cassovary/pull/211#issuecomment-211107243

szymonm commented 8 years ago

Main goal of this change was to allow easier switch to Long ids by unifying underlying collections.

Second goal was to avoid boxing where possible. I think we should see an improvement in PageRank after I change its implementation to avoid using toSeq method on CSeq - will check that in the next days.

szymonm commented 8 years ago

Ran improved PageRank vs the previous one:

graph before improved
facebook 40ms 24ms
wiki-Vote 55ms 35ms

Used: runMain com.twitter.cassovary.PerformanceBenchmark -globalpr -reps=1000

pankajgupta commented 8 years ago

@szymonm just confirming that this is the PR to be reviewed?

szymonm commented 8 years ago

@pankajgupta yes, this one.

pankajgupta commented 8 years ago

Just fyi, I am in the middle of my review and might take 1-2 days. Will write here when I am finished with all of it.

pankajgupta commented 8 years ago

Finished review. Looks good -- just some minor comments.

Please check why travis-ci is failing. If it helps, I still say, let's drop support for scala 2.10 and move on to 2.11 only.

pankajgupta commented 8 years ago

ping @szymonm

szymonm commented 8 years ago

Sorry for the delay. Your turn, @pankajgupta

pankajgupta commented 8 years ago

ready to merge once you are able to double check that no new class is being created at runtime as well (as mentioned above)

szymonm commented 8 years ago

Do you have any paticular idea on how to check that the classes are not generated in runtime?

I checked what is generated in compile time and the looks like the classes are generated then:

➜  cassovary git:(feature/fastseeq) ls cassovary-core/target/scala-2.11/classes/com/twitter/cassovary/graph/Node\$CSeq*
cassovary-core/target/scala-2.11/classes/com/twitter/cassovary/graph/Node$CSeqInt$4$1.class
cassovary-core/target/scala-2.11/classes/com/twitter/cassovary/graph/Node$CSeqInt$5$1.class
cassovary-core/target/scala-2.11/classes/com/twitter/cassovary/graph/Node$CSeqInt$6$1.class
cassovary-core/target/scala-2.11/classes/com/twitter/cassovary/graph/Node$CSeqInt$7$1.class
cassovary-core/target/scala-2.11/classes/com/twitter/cassovary/graph/Node$CSeqIntArrayWrapper$4$1.class
cassovary-core/target/scala-2.11/classes/com/twitter/cassovary/graph/Node$CSeqIntArrayWrapper$5$1.class
cassovary-core/target/scala-2.11/classes/com/twitter/cassovary/graph/Node$CSeqIntArrayWrapper$6$1.class
cassovary-core/target/scala-2.11/classes/com/twitter/cassovary/graph/Node$CSeqIntArrayWrapper$7$1.class
szymonm commented 8 years ago

I have no idea why travis is failing for tests that seem to be unrelated. Looks like it is some Scala's compiler internal issue. Let's drop support for 2.10.

pankajgupta commented 8 years ago

ok to stop supporting 2.10 @szymonm -- is some code only for 2.10? if so, please clean that up.

also, perhaps try writing some print statements to test whether the class is being generated at runtime or not?

szymonm commented 8 years ago

yes, there is some code for 2.10. I'm cleaning it right now. And I'll squash all the commits.

I tested with print statements and they are run only in compile time.

pankajgupta commented 8 years ago

awesome

On Sun, Aug 14, 2016 at 11:08 PM, Szymon notifications@github.com wrote:

yes, there is some code for 2.10. I'm cleaning it right now. And I'll squash all the commits.

I tested with print statements and they are run only in compile time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/twitter/cassovary/pull/211#issuecomment-239686164, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEV2Lb85sNrhOCxUBOLtIle95Rhh-C6ks5qf1KggaJpZM4HM2Uj .