Open jowens opened 4 years ago
In general I think our numbers are a little better than the ones reported to us for BC, PR, and SSSP. Some of the BFS runs reported to us are faster than what we had, so we're very interested in the settings for those.
Thanks a lot for reporting this discrepancy and also providing the exact commands to reproduce these results. I will run the commands and generate a similar table. I also have the scripts we used in the first place to produce the numbers we reported in the paper. I can find the parameters that produce better numbers on BFS.
@AjayBrahmakshatriya any good settings for us?
@AjayBrahmakshatriya Checkin' in.
Hi @jowens and @neoblizz, I reran the experiments on our machine with the flags you had provided. For each of the flag it produced a lot of json files (because some of the flags had multiple options). So I chose the fastest average-process-time out of those. Here are the results for the experiments for SSSP, BFS and BC -
Graph | Gunrock-reported | Our new | Our old |
---|---|---|---|
hollywood-2009 | 14.445 | 13.678 | 29.030 |
indochina-2004 | 29.980 | 30.397 | 47.970 |
roadNet-CA | 95.989 | 134.038 | 86.440 |
road_central | 582.998 | 1012.211 | 632.970 |
road_usa | 1001.980 | 1593.763 | 987.930 |
soc-LiveJournal1 | 36.388 | 36.056 | 88.040 |
soc-orkut | 73.996 | 71.183 | 213.930 |
soc-sinaweibo | 557.461 | 548.531 | 1095.600 |
soc-twitter-2010 | 228.118 | 227.485 | 505.310 |
Graph | Gunrock-reported | Our new | Our old |
---|---|---|---|
hollywood-2009 | 1.008 | 1.880 | 1.670 |
indochina-2004 | 13.623 | 13.266 | 12.570 |
roadNet-CA | 51.087 | 81.530 | 68.350 |
road_central | 301.835 | 521.023 | 434.010 |
road_usa | 515.051 | 766.221 | 775.160 |
soc-LiveJournal1 | 3.021 | 3.814 | 3.630 |
soc-orkut | 3.986 | 4.305 | 1.660 |
soc-sinaweibo | 10.741 | 10.805 | 94.770 |
soc-twitter-2010 | 15.432 | 16.221 | 13.520 |
Graph | Gunrock-reported | Our new | Our old |
---|---|---|---|
hollywood-2009 | 6.136 | 7.292 | 5.870 |
indochina-2004 | 9.080 | 16.909 | 13.630 |
roadNet-CA | 48.623 | 80.462 | 66.790 |
road_central | 336.540 | 590.307 | 429.240 |
road_usa | 499.529 | 908.687 | 788.230 |
soc-LiveJournal1 | 15.351 | 18.103 | 14.550 |
soc-orkut | 7.878 | 32.933 | 243.050 |
soc-sinaweibo | 271.258 | 267.738 | 235.750 |
soc-twitter-2010 | 115.023 | 106.935 | 97.520 |
Unfortunately for PR, I couldn't reproduce the experiments because all the experiments segfault before printing any timings or producing the json files.
I am currently on this commit - commit 7c197d6a498806fcfffd1f9304c663379a77f5e4 (HEAD, tag: v1.1)
. @neoblizz do you know what might be the reason this is happening?
Anyway, for rest of the experiments some of the numbers are better than our numbers before and we are happy to change those in the paper. Although for some of the road graphs there is still some discrepancy between what you have reported and what the new runs reported. Our first guess was this is because of the different weight distribution. But it is also there for BFS and BC which are unweighted.
Anyway clue why this might be happening?
Unfortunately for PR, I couldn't reproduce the experiments because all the experiments segfault before printing any timings or producing the json files. I am currently on this commit - commit 7c197d6a498806fcfffd1f9304c663379a77f5e4 (HEAD, tag: v1.1). @neoblizz do you know what might be the reason this is happening?
I am investigating a bug in PR with @crozhon right now, what was the exact error that you encountered here?
Although for some of the road graphs there is still some discrepancy between what you have reported and what the new runs reported. Our first guess was this is because of the different weight distribution. But it is also there for BFS and BC which are unweighted.
Was there a random source nodes for these runs, maybe that is the cause? How much was the difference, it would help if I had the command lines for these runs. Also, some changes were made since some of those runs which may have caused minor differences as well.
Can you distinguish between "our new" and "our old"? "our new" is ... you reran with the command line we gave you and "our new" is that result, whereas "our old" is what you ran on your own?
Certainly we are interested in the command lines where "our old" is faster than our run.
(FWIW Muhammad redesigned the command lines so that you could specify multiple runs with one command line; you'll see something like --param=true,false
and it will do separate runs with --param=true
and --param=false
. This is helpful because then we only have to load the graph once; it cuts down our time substantially on large multi-parameter runs. If you look at the parameters in the JSONs linked in the table, you can cut down the command line to only run the fastest set of parameters and then you won't have to dig through multiple command lines.)
@neoblizz there wasn't an error, the execution crashed with a segmentation fault after the pagerank run. @jowens oh yes, I forgot to clarify the columns in the table. The first column has the numbers you had shared with me a week ago. The third column is the numbers we had in our paper (with the tuning of parameters we had done). The second column is the numbers from the recent experiments. For the cases where the third column is the fastest, I think it is because of the better tuned do_a and do_b parameters. The one with a significant speed up seems to be for soc-orkut on BFS. I have run the parameter sweep script again for this graph and I will find the optimum parameters and get back to you.
Thank you for being so conscientious about this.
Actually, the sweep just showed a number reasonably close with the command line -
./bfs market soc-orkut.mtx --direction-optimized --do-a=0.012 --do-b=0.012 --src=0 --num-runs=10 --quick --device=5
The average execution time is - 1.7415 ms (as compared to 3.986 ms from your experiments and 1.66 ms in our paper).
Our input graphs are already symmetrized in this case, so I haven't added the --undirected
flag.
(FWIW Muhammad redesigned the command lines so that you could specify multiple runs with one command line; you'll see something like
--param=true,false
and it will do separate runs with--param=true
and--param=false
. This is helpful because then we only have to load the graph once; it cuts down our time substantially on large multi-parameter runs. If you look at the parameters in the JSONs linked in the table, you can cut down the command line to only run the fastest set of parameters and then you won't have to dig through multiple command lines.)
Oh yes! I realized from the new command lines you had sent. I modified them to choose the best parameters as you had shown in the table. But they still weren't matching the numbers you had sent. So just to be sure, I ran the command as it is (except for minor changes like not exploring 64 bit vertex ids and edge ids) and scanned all the generated JSONs to find the best one. The numbers in the table show the best.
@neoblizz there wasn't an error, the execution crashed with a segmentation fault after the pagerank run.
Since the commit you are working on is from Sept. 2019, I believe the segfault bug was solved and should be fixed in the master (HEAD). There's a separate issue that I am working on, but using the latest commit should work for PageRank. The official release is just pending.
@neoblizz there wasn't an error, the execution crashed with a segmentation fault after the pagerank run.
Since the commit you are working on is from Sept. 2019, I believe the segfault bug was solved and should be fixed in the master (HEAD). There's a separate issue that I am working on, but using the latest commit should work for PageRank. The official release is just pending.
Sounds good! If you think the performance with v1.1 and master (HEAD) should be close, I will use the latest commit for the PR experiments. Thanks for looking into this. I will let you know if it works.
But they still weren't matching the numbers you had sent.
I promise we are suuuuuper honest about our numbers. As are you. Thank you.
But they still weren't matching the numbers you had sent.
I promise we are suuuuuper honest about our numbers. As are you. Thank you.
Ofcourse! 😄
And since the issue is with just the road graphs, my guess is it could be a difference in the input files. I will download the datasets again (I am using the same ones that you have in your repository) and not do any preprocessing and run the experiments again.
@neoblizz there wasn't an error, the execution crashed with a segmentation fault after the pagerank run.
Since the commit you are working on is from Sept. 2019, I believe the segfault bug was solved and should be fixed in the master (HEAD). There's a separate issue that I am working on, but using the latest commit should work for PageRank. The official release is just pending.
Sounds good! If you think the performance with v1.1 and master (HEAD) should be close, I will use the latest commit for the PR experiments. Thanks for looking into this. I will let you know if it works.
@neoblizz I pulled the latest master branch and tried recompiling gunrock, but I am getting a lot of cmake errors on the lines of -
CMake Error at ~/scratch/cmake-3.17.2-Linux-x86_64/share/cmake-3.17/Modules/FindCUDA.cmake:1837 (add_library):
Cannot find source file:
......./gunrockv1.1/externals/moderngpu/src/context.hxx
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm
.hpp .hxx .in .txx
Call Stack (most recent call first):
gunrock/CMakeLists.txt:47 (CUDA_ADD_LIBRARY)
There is one such error for each binary too including pr
. I am using cmake 3.17.2 (which is what I was using previously for 7c197d6 too and it compiled fine).
Is this a transient issue with the latest commit? Is there a stable commit from before that I can use?
@neoblizz I pulled the latest master branch and tried recompiling gunrock, but I am getting a lot of cmake errors on the lines of -
CMake Error at ~/scratch/cmake-3.17.2-Linux-x86_64/share/cmake-3.17/Modules/FindCUDA.cmake:1837 (add_library): Cannot find source file: ......./gunrockv1.1/externals/moderngpu/src/context.hxx Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm .hpp .hxx .in .txx Call Stack (most recent call first): gunrock/CMakeLists.txt:47 (CUDA_ADD_LIBRARY)
There is one such error for each binary too including
pr
. I am using cmake 3.17.2 (which is what I was using previously for 7c197d6 too and it compiled fine).Is this a transient issue with the latest commit? Is there a stable commit from before that I can use?
Seems like you didn't fetch the submodules again since we now use moderngpu 2.0 instead of 1.0.
git submodule update --init
@neoblizz I pulled the latest master branch and tried recompiling gunrock, but I am getting a lot of cmake errors on the lines of -
CMake Error at ~/scratch/cmake-3.17.2-Linux-x86_64/share/cmake-3.17/Modules/FindCUDA.cmake:1837 (add_library): Cannot find source file: ......./gunrockv1.1/externals/moderngpu/src/context.hxx Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm .hpp .hxx .in .txx Call Stack (most recent call first): gunrock/CMakeLists.txt:47 (CUDA_ADD_LIBRARY)
There is one such error for each binary too including
pr
. I am using cmake 3.17.2 (which is what I was using previously for 7c197d6 too and it compiled fine). Is this a transient issue with the latest commit? Is there a stable commit from before that I can use?Seems like you didn't fetch the submodules again since we now use moderngpu 2.0 instead of 1.0.
git submodule update --init
I just pulled the latest version of gunrock again (with --recursive) and it seemed to have fixed the problem. I am running the experiments right now for PageRank. Please let me know if you want the flags for any other experiments (that we had originally used in the paper).
I just fixed some issues with using multiple options at once (--pull=false,true) and an issue with OpenMP that is only on dev
. If you have another issue try checking that out.
Simply for our future use, it would be nice to have the settings for everything you tested, yes. But the primitive-specific settings are specifically what we'll use straightaway.
@AjayBrahmakshatriya any set of settings that you can send us for your runs that were faster than ours?
I just checked the logs from our experiments and the only set of experiments where our numbers were faster than your reported numbers are for BFS for the following two graphs. I have specified the command line arguments and the reported average time here. For the rest your reported numbers were faster and with your mentioned flags we were able to match those.
I just reran these experiments to make sure there isn't a mix up in the flags. These experiments are run on a DGX1 with a V-100 (single GPU)
Graph | Command line | Gunrock reported | Our experiments |
---|---|---|---|
soc-orkut | ./bfs market soc-orkut.mtx --direction-optimized --do-a=0.0 12 --do-b=0.012 --src=0 --num-runs=10 --quick --device=5 | 3.986 | 1.680 |
soc-twitter-2010 | ./bfs market soc-twitter-2010.mtx --direction-optimized --d o-a=0.003 --do-b=0.003 --src=0 --num-runs=10 --quick --device=5 | 15.432 | 13.65 |
thank you @AjayBrahmakshatriya!
Searched our results repo for 9 datasets on 4 Gunrock primitives on V100. SSSP is directed; the other three are undirected.
CC: @neoblizz @AjayBrahmakshatriya @crozhon
github doesn't allow .html attachments so I'm renaming my html table as text and attaching it below.
ugf_table.txt
Pasting the source code here for posterity.