pandeyshubham25 / pagerank

0 stars 0 forks source link

Timing results #13

Open NMerz opened 2 years ago

NMerz commented 2 years ago

Results (linked for convenience): https://docs.google.com/spreadsheets/d/1wZ2FA-OrMAT2rlqv5vUFAaQhf_hppKzaWY8FQ0mbZjM/edit?usp=sharing I was looking at our timing data. Quite a number of interesting surprises. I'll edit more thoughts in later.

I was astonished just how much slower, overall, the M1 was versus the i7; I would have expected much closer performance. However, I am concerned about a confounding factor: I think we used a different compiler. I used clang++, but specifically the clang++ that ships with Mac which may be slightly different. I would assume Apple put some OS-specific optimizations in, but perhaps not. Did you use clang or gcc?

NMerz commented 2 years ago

A couple other thoughts: 2) CSR load times were extraordinarily slow. However, the files were only a little larger than the regular, and, generally, much smaller than the coo files. Correspondingly, the COO files loaded shockingly fast given that they were by far the largest. I had thought here that disk reads would be the limiting factor. This seems to be much mistaken. There may be some inefficiencies with the frequent checks for changing sections of the csr files, but I would not have thought these would consume many operations. Perhaps, the standard file operations take far longer than I thought. I think it must be some artifact of the reading code instead of inherent complexity here. 3) The more matrix formats seemed to specifically benefit certain graphs more than others. For example, the amazon graph experienced much more benefit from it than other graphs such as the patents graph that are of a relatively similar size. Therefore, the primary speedup does not seem to be reduced computation, but some sort of increased locality. (perhaps Amazon and other with higher speedups had more fairly connected components of a certain size that was bumped into cache with the matrix formats). Or, the original edges may have been sorted differently before conversion (sorting by position in the graph prior to numbering should - I think - produce a much better numbering than sorting by outgoing/incoming edge

NMerz commented 2 years ago

4) i7 road data was abnormally slow. Can you confirm that yours is only about 80 MB?

pandeyshubham25 commented 2 years ago

Results (linked for convenience): https://docs.google.com/spreadsheets/d/1wZ2FA-OrMAT2rlqv5vUFAaQhf_hppKzaWY8FQ0mbZjM/edit?usp=sharing I was looking at our timing data. Quite a number of interesting surprises. I'll edit more thoughts in later.

I was astonished just how much slower, overall, the M1 was versus the i7; I would have expected much closer performance. However, I am concerned about a confounding factor: I think we used a different compiler. I used clang++, but specifically the clang++ that ships with Mac which may be slightly different. I would assume Apple put some OS-specific optimizations in, but perhaps not. Did you use clang or gcc?

I used g++

pandeyshubham25 commented 2 years ago
  1. i7 road data was abnormally slow. Can you confirm that yours is only about 80 MB?

Let me check that

pandeyshubham25 commented 2 years ago

I also think that the matrix representation went on to conincidentally favor some of the datasets. Let me try running CSR as well on my system and see what happens.

NMerz commented 2 years ago

Results (linked for convenience): https://docs.google.com/spreadsheets/d/1wZ2FA-OrMAT2rlqv5vUFAaQhf_hppKzaWY8FQ0mbZjM/edit?usp=sharing I was looking at our timing data. Quite a number of interesting surprises. I'll edit more thoughts in later. I was astonished just how much slower, overall, the M1 was versus the i7; I would have expected much closer performance. However, I am concerned about a confounding factor: I think we used a different compiler. I used clang++, but specifically the clang++ that ships with Mac which may be slightly different. I would assume Apple put some OS-specific optimizations in, but perhaps not. Did you use clang or gcc?

I used g++

Do you have time to rerun with clang++ instead?

pandeyshubham25 commented 2 years ago

Results (linked for convenience): https://docs.google.com/spreadsheets/d/1wZ2FA-OrMAT2rlqv5vUFAaQhf_hppKzaWY8FQ0mbZjM/edit?usp=sharing I was looking at our timing data. Quite a number of interesting surprises. I'll edit more thoughts in later. I was astonished just how much slower, overall, the M1 was versus the i7; I would have expected much closer performance. However, I am concerned about a confounding factor: I think we used a different compiler. I used clang++, but specifically the clang++ that ships with Mac which may be slightly different. I would assume Apple put some OS-specific optimizations in, but perhaps not. Did you use clang or gcc?

I used g++

Do you have time to rerun with clang++ instead?

let me try that today evening

pandeyshubham25 commented 2 years ago

Results (linked for convenience): https://docs.google.com/spreadsheets/d/1wZ2FA-OrMAT2rlqv5vUFAaQhf_hppKzaWY8FQ0mbZjM/edit?usp=sharing I was looking at our timing data. Quite a number of interesting surprises. I'll edit more thoughts in later. I was astonished just how much slower, overall, the M1 was versus the i7; I would have expected much closer performance. However, I am concerned about a confounding factor: I think we used a different compiler. I used clang++, but specifically the clang++ that ships with Mac which may be slightly different. I would assume Apple put some OS-specific optimizations in, but perhaps not. Did you use clang or gcc?

I used g++

Do you have time to rerun with clang++ instead?

The run times seem quite similar for both in my machine.

NMerz commented 2 years ago

Got it, thanks for checking

pandeyshubham25 commented 2 years ago
  1. i7 road data was abnormally slow. Can you confirm that yours is only about 80 MB?

https://sparse.tamu.edu/DIMACS10/road_usa , used Matrix Market from here, its pretty large, not 80 MB. What we can do here is, i would mark that as different dataset in results discussion? Also, I needed your help with populating the meta info about datasets in the DTATASET INFO section on excel sheet

NMerz commented 2 years ago

7. i7 road data was abnormally slow. Can you confirm that yours is only about 80 MB?

https://sparse.tamu.edu/DIMACS10/road_usa , used Matrix Market from here, its pretty large, not 80 MB. What we can do here is, i would mark that as different dataset in results discussion? Also, I needed your help with populating the meta info about datasets in the DTATASET INFO section on excel sheet

Sorry, that is my fault. I don't know why I put that link. My road_usa came from https://networkrepository.com/road-usa.php

pandeyshubham25 commented 2 years ago
  1. i7 road data was abnormally slow. Can you confirm that yours is only about 80 MB?

https://sparse.tamu.edu/DIMACS10/road_usa , used Matrix Market from here, its pretty large, not 80 MB. What we can do here is, i would mark that as different dataset in results discussion? Also, I needed your help with populating the meta info about datasets in the DTATASET INFO section on excel sheet

Sorry, that is my fault. I don't know why I put that link. My road_usa came from https://networkrepository.com/road-usa.php

Got it, I will try running on this too then.