Edge partition - Githubissues

NMerz commented 2 years ago

So I created an initial implementation of the edge-based partitioning scheme with https://dl-acm-org.ezproxy.lib.purdue.edu/doi/pdf/10.1145/3472456.3475737 as the base instruction. However, initial testing looks notably slower than the trivial parallel implementation.

I did decrease the cache size to fit in L1 since the M1's L2 is a shared cache (and our data structure is likely less efficient that theirs). Also, I did not do the fancy inter edge that they mentioned.

However, even considering those two caveats, this is surprisingly poor performance (especially since times do not include partitioning setup):

(base) nmerz@Nathans-MacBook-Pro pagerank % ./prank_trivial_parallel 9, 50, 0.850000, road_usa
Parallel parameters: road_usa 50 0.85
Load time: 5.27885
Iteration time: 25.136
(base) nmerz@Nathans-MacBook-Pro pagerank % ./prank_edge_partition 9, 50, 0.850000, road_usa 
Parallel parameters: road_usa 50 0.85
Load time: 4.8937
Iteration time: 35.6632

Load times do seem to consistently edge out the trivial load times, I think that is due to https://github.com/pandeyshubham25/pagerank/pull/10/files#diff-541872eb9bf166023c4b12e79fdb74a38739aaf4d4ad2ed7e85f9f7446b6a4a3R99 and company switching to a smaller copy (of a reference) (I think before the whole pages were being passed by value, but I don't know c++ well enough to be sure of that).

pandeyshubham25 commented 2 years ago

interesting... did you try running edge_parallel without 'new Page()' , like we are doing for serial and trivial_parallel ? Also, are you reclaiming the memory at the end? or does map<> take care of it ? i would think not Also, let me check the edge partitioning logic separately and get back to you.. if I have questions .. i might need a KT session :)

NMerz commented 2 years ago

interesting... did you try running edge_parallel without 'new Page()' , like we are doing for serial and trivial_parallel ?

No, I could, but it is a small enough difference I'm nor overly concerned. That was more a novelty finding.

Also, are you reclaiming the memory at the end? or does map<> take care of it ? i would think not

I don't reclaim memory, the program assuredly leaks worse than a sinking boat. But aside from cleanliness there isn't really a programatic reason to reclaim memory right before termination since it will be reclaimed anyway. I can fix it if you want.

Also, let me check the edge partitioning logic separately and get back to you.. if I have questions .. i might need a KT session :)

Can do. I apologize for the messy code.

pandeyshubham25 commented 2 years ago

I still haven't looked at your code, but would surely do it by today or tomorrow EOD. Also, I was looking at the sparse matrix vector multiplication ideas and was thinking we should try that out. What are your thoughts ? Finally, let me know once you are done with running serial and parallel algo on the datasets, we can then discuss what we want to do next in detail.

pandeyshubham25 commented 2 years ago

I went through the implementation and also the paper. My thoughts - 1) Currently the edge distribution is random at best. So, unless we are taking care of bringing down the number of inter-partition communications, the performance has to be suboptimal. Section 3.1 in the paper talks particularly about this. 2) The only benefit that we are getting is locality of reference i guess ? 3) Also your code is not messy :P

NMerz commented 2 years ago

Just wanted to say I did see your questions here (and the timing stuff) and I'll get you an intelligent response once I finish HW5: maybe late today, maybe late tomorrow (it will take me a while to give it a reasonable look).

NMerz commented 2 years ago

I see a passing mention to intra-node locality in section 3.1, but I do not see any strategies for achieving this. Am I missing something? In fact, in table 1 it still looks like the majority of edges are inter, not intra edges. I am not entirely sure what their great speedup comes from. It would seem improbable that is comes from such a small fraction of intra edges. Maybe they gain significantly from the collecting of the inter-edges?

pandeyshubham25 commented 2 years ago

Maybe they gain significantly from the collecting of the inter-edges? I think this is the case, i need to check how this can be done, there must be some paper that talks about it in detail.

NMerz commented 2 years ago

I think sparse matrix multiplication may be a promising avenue to explore. I will be upfront through that my personal grasp of linear algebra is... underdeveloped.

NMerz commented 2 years ago

Maybe they gain significantly from the collecting of the inter-edges? I think this is the case, i need to check how this can be done, there must be some paper that talks about it in detail.

I'm pretty sure I can implement this working from scratch with message passing. But, it will take some time because I'll have to ditch mpi and rebuild from scratch

pandeyshubham25 commented 2 years ago

i am working on spMV thing right now, whenever you are free this week let me know, I am almost done with implementing coordinate representation and multiplication using that for sparse matrix, will let you know about it.

NMerz commented 2 years ago

Are you using a library for the actual multiplication or rebuilding from scratch? If from scratch, which paper (if any) are you using as a guide?

pandeyshubham25 commented 2 years ago

Maybe they gain significantly from the collecting of the inter-edges? I think this is the case, i need to check how this can be done, there must be some paper that talks about it in detail.

I'm pretty sure I can implement this working from scratch with message passing. But, it will take some time because I'll have to ditch mpi and rebuild from scratch

Dont do it if it takes lot of your time, lets just wrap up after spMV

pandeyshubham25 commented 2 years ago

Are you using a library for the actual multiplication or rebuilding from scratch? If from scratch, which paper (if any) are you using as a guide?

I used quite a lot of sources that talk about sparse matrix representation. Adding some of them here - 1) I read this but have not though of implementing CSR version yet http://www.cs.cmu.edu/afs/cs/academic/class/15418-s19/www/lectures/rec_04.pdf

2) This talks about two prominent representations https://www.codetd.com/en/article/7846371

3) Scrolled through some of the code here to get high level idea, but I am implementing using mpi and not pthreads https://github.com/Sable/fait-maison-spmv

pandeyshubham25 / pagerank

Edge partition #10