request for Volta numbers on 9 datasets

jowens commented 4 years ago

Searched our results repo for 9 datasets on 4 Gunrock primitives on V100. SSSP is directed; the other three are undirected.

CC: @neoblizz @AjayBrahmakshatriya @crozhon

github doesn't allow .html attachments so I'm renaming my html table as text and attaching it below.

ugf_table.txt

Pasting the source code here for posterity.

ugf = df.copy()
ugf_datasets = [
    "soc-orkut",
    "soc-twitter-2010",
    "soc-LiveJournal1",
    "soc-sinaweibo",
    "hollywood-2009",
    "indochina-2004",
    "road_usa",
    "road_central",
    "roadNet-CA",
]
ugf = ugf[
    (ugf["gpuinfo_name"] == "Tesla V100")
    & (ugf["dataset"].isin(ugf_datasets))
    & (
        ((ugf["undirected"] == True) & ugf["primitive"].isin(["dobfs", "pr", "bc"]))
        | ((ugf["undirected"] == False) & ugf["primitive"].isin(["sssp"]))
    )
]

ugf_fastest = ugf.groupby(["dataset", "primitive", "undirected"])[
    "avg_process_time"
].transform(min)
ugf = ugf[ugf["avg_process_time"] == ugf_fastest]

save(
    chart=alt.Chart(),
    df=ugf,
    plotname="ugf",
    outputdir="../plots",
    formats=["tablehtml"],
    sortby=[
        "primitive",
        "dataset",
        "engine",
        "gunrock_version",
        "advance_mode",
        "undirected",
        "mark_pred",
        "idempotence",
    ],
    columns=[
        "primitive",
        "dataset",
        "avg_process_time",
        "avg_mteps",
        "engine",
        "num_vertices",
        "num_edges",
        "gunrock_version",
        "advance_mode",
        "undirected",
        "mark_pred",
        "idempotence",
        "gpuinfo_name",
        "gpuinfo_name_full",
        "time",
        "details",
    ],
    mdtext="",
)

jowens commented 4 years ago

In general I think our numbers are a little better than the ones reported to us for BC, PR, and SSSP. Some of the BFS runs reported to us are faster than what we had, so we're very interested in the settings for those.

AjayBrahmakshatriya commented 4 years ago

Thanks a lot for reporting this discrepancy and also providing the exact commands to reproduce these results. I will run the commands and generate a similar table. I also have the scripts we used in the first place to produce the numbers we reported in the paper. I can find the parameters that produce better numbers on BFS.

jowens commented 4 years ago

@AjayBrahmakshatriya any good settings for us?

jowens commented 4 years ago

@AjayBrahmakshatriya Checkin' in.

AjayBrahmakshatriya commented 4 years ago

Hi @jowens and @neoblizz, I reran the experiments on our machine with the flags you had provided. For each of the flag it produced a lot of json files (because some of the flags had multiple options). So I chose the fastest average-process-time out of those. Here are the results for the experiments for SSSP, BFS and BC -

BC

Graph	Gunrock-reported	Our new	Our old
hollywood-2009	14.445	13.678	29.030
indochina-2004	29.980	30.397	47.970
roadNet-CA	95.989	134.038	86.440
road_central	582.998	1012.211	632.970
road_usa	1001.980	1593.763	987.930
soc-LiveJournal1	36.388	36.056	88.040
soc-orkut	73.996	71.183	213.930
soc-sinaweibo	557.461	548.531	1095.600
soc-twitter-2010	228.118	227.485	505.310

BFS

Graph	Gunrock-reported	Our new	Our old
hollywood-2009	1.008	1.880	1.670
indochina-2004	13.623	13.266	12.570
roadNet-CA	51.087	81.530	68.350
road_central	301.835	521.023	434.010
road_usa	515.051	766.221	775.160
soc-LiveJournal1	3.021	3.814	3.630
soc-orkut	3.986	4.305	1.660
soc-sinaweibo	10.741	10.805	94.770
soc-twitter-2010	15.432	16.221	13.520

SSSP (weights 0-1000)

Graph	Gunrock-reported	Our new	Our old
hollywood-2009	6.136	7.292	5.870
indochina-2004	9.080	16.909	13.630
roadNet-CA	48.623	80.462	66.790
road_central	336.540	590.307	429.240
road_usa	499.529	908.687	788.230
soc-LiveJournal1	15.351	18.103	14.550
soc-orkut	7.878	32.933	243.050
soc-sinaweibo	271.258	267.738	235.750
soc-twitter-2010	115.023	106.935	97.520

Unfortunately for PR, I couldn't reproduce the experiments because all the experiments segfault before printing any timings or producing the json files. I am currently on this commit - commit 7c197d6a498806fcfffd1f9304c663379a77f5e4 (HEAD, tag: v1.1). @neoblizz do you know what might be the reason this is happening?

Anyway, for rest of the experiments some of the numbers are better than our numbers before and we are happy to change those in the paper. Although for some of the road graphs there is still some discrepancy between what you have reported and what the new runs reported. Our first guess was this is because of the different weight distribution. But it is also there for BFS and BC which are unweighted.

Anyway clue why this might be happening?

neoblizz commented 4 years ago

Unfortunately for PR, I couldn't reproduce the experiments because all the experiments segfault before printing any timings or producing the json files. I am currently on this commit - commit 7c197d6a498806fcfffd1f9304c663379a77f5e4 (HEAD, tag: v1.1). @neoblizz do you know what might be the reason this is happening?

I am investigating a bug in PR with @crozhon right now, what was the exact error that you encountered here?

Although for some of the road graphs there is still some discrepancy between what you have reported and what the new runs reported. Our first guess was this is because of the different weight distribution. But it is also there for BFS and BC which are unweighted.

Was there a random source nodes for these runs, maybe that is the cause? How much was the difference, it would help if I had the command lines for these runs. Also, some changes were made since some of those runs which may have caused minor differences as well.

jowens commented 4 years ago

Can you distinguish between "our new" and "our old"? "our new" is ... you reran with the command line we gave you and "our new" is that result, whereas "our old" is what you ran on your own?

Certainly we are interested in the command lines where "our old" is faster than our run.

jowens commented 4 years ago

(FWIW Muhammad redesigned the command lines so that you could specify multiple runs with one command line; you'll see something like --param=true,false and it will do separate runs with --param=true and --param=false. This is helpful because then we only have to load the graph once; it cuts down our time substantially on large multi-parameter runs. If you look at the parameters in the JSONs linked in the table, you can cut down the command line to only run the fastest set of parameters and then you won't have to dig through multiple command lines.)

AjayBrahmakshatriya commented 4 years ago

@neoblizz there wasn't an error, the execution crashed with a segmentation fault after the pagerank run. @jowens oh yes, I forgot to clarify the columns in the table. The first column has the numbers you had shared with me a week ago. The third column is the numbers we had in our paper (with the tuning of parameters we had done). The second column is the numbers from the recent experiments. For the cases where the third column is the fastest, I think it is because of the better tuned do_a and do_b parameters. The one with a significant speed up seems to be for soc-orkut on BFS. I have run the parameter sweep script again for this graph and I will find the optimum parameters and get back to you.

jowens commented 4 years ago

Thank you for being so conscientious about this.

AjayBrahmakshatriya commented 4 years ago

Actually, the sweep just showed a number reasonably close with the command line - ./bfs market soc-orkut.mtx --direction-optimized --do-a=0.012 --do-b=0.012 --src=0 --num-runs=10 --quick --device=5 The average execution time is - 1.7415 ms (as compared to 3.986 ms from your experiments and 1.66 ms in our paper). Our input graphs are already symmetrized in this case, so I haven't added the --undirected flag.

AjayBrahmakshatriya commented 4 years ago

(FWIW Muhammad redesigned the command lines so that you could specify multiple runs with one command line; you'll see something like --param=true,false and it will do separate runs with --param=true and --param=false. This is helpful because then we only have to load the graph once; it cuts down our time substantially on large multi-parameter runs. If you look at the parameters in the JSONs linked in the table, you can cut down the command line to only run the fastest set of parameters and then you won't have to dig through multiple command lines.)

Oh yes! I realized from the new command lines you had sent. I modified them to choose the best parameters as you had shown in the table. But they still weren't matching the numbers you had sent. So just to be sure, I ran the command as it is (except for minor changes like not exploring 64 bit vertex ids and edge ids) and scanned all the generated JSONs to find the best one. The numbers in the table show the best.

neoblizz commented 4 years ago

@neoblizz there wasn't an error, the execution crashed with a segmentation fault after the pagerank run.

Since the commit you are working on is from Sept. 2019, I believe the segfault bug was solved and should be fixed in the master (HEAD). There's a separate issue that I am working on, but using the latest commit should work for PageRank. The official release is just pending.

AjayBrahmakshatriya commented 4 years ago

@neoblizz there wasn't an error, the execution crashed with a segmentation fault after the pagerank run.

Since the commit you are working on is from Sept. 2019, I believe the segfault bug was solved and should be fixed in the master (HEAD). There's a separate issue that I am working on, but using the latest commit should work for PageRank. The official release is just pending.

Sounds good! If you think the performance with v1.1 and master (HEAD) should be close, I will use the latest commit for the PR experiments. Thanks for looking into this. I will let you know if it works.

jowens commented 4 years ago

But they still weren't matching the numbers you had sent.

I promise we are suuuuuper honest about our numbers. As are you. Thank you.

AjayBrahmakshatriya commented 4 years ago

But they still weren't matching the numbers you had sent.

I promise we are suuuuuper honest about our numbers. As are you. Thank you.

Ofcourse! 😄

And since the issue is with just the road graphs, my guess is it could be a difference in the input files. I will download the datasets again (I am using the same ones that you have in your repository) and not do any preprocessing and run the experiments again.

AjayBrahmakshatriya commented 4 years ago

@neoblizz there wasn't an error, the execution crashed with a segmentation fault after the pagerank run.

Since the commit you are working on is from Sept. 2019, I believe the segfault bug was solved and should be fixed in the master (HEAD). There's a separate issue that I am working on, but using the latest commit should work for PageRank. The official release is just pending.

Sounds good! If you think the performance with v1.1 and master (HEAD) should be close, I will use the latest commit for the PR experiments. Thanks for looking into this. I will let you know if it works.

@neoblizz I pulled the latest master branch and tried recompiling gunrock, but I am getting a lot of cmake errors on the lines of -

CMake Error at ~/scratch/cmake-3.17.2-Linux-x86_64/share/cmake-3.17/Modules/FindCUDA.cmake:1837 (add_library):
  Cannot find source file:

    ......./gunrockv1.1/externals/moderngpu/src/context.hxx

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm
  .hpp .hxx .in .txx
Call Stack (most recent call first):
  gunrock/CMakeLists.txt:47 (CUDA_ADD_LIBRARY)

There is one such error for each binary too including pr. I am using cmake 3.17.2 (which is what I was using previously for 7c197d6 too and it compiled fine).

Is this a transient issue with the latest commit? Is there a stable commit from before that I can use?

neoblizz commented 4 years ago

@neoblizz I pulled the latest master branch and tried recompiling gunrock, but I am getting a lot of cmake errors on the lines of -
CMake Error at ~/scratch/cmake-3.17.2-Linux-x86_64/share/cmake-3.17/Modules/FindCUDA.cmake:1837 (add_library):
  Cannot find source file:

    ......./gunrockv1.1/externals/moderngpu/src/context.hxx

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm
  .hpp .hxx .in .txx
Call Stack (most recent call first):
  gunrock/CMakeLists.txt:47 (CUDA_ADD_LIBRARY)
There is one such error for each binary too including pr. I am using cmake 3.17.2 (which is what I was using previously for 7c197d6 too and it compiled fine).

Is this a transient issue with the latest commit? Is there a stable commit from before that I can use?

Seems like you didn't fetch the submodules again since we now use moderngpu 2.0 instead of 1.0.

git submodule update --init

AjayBrahmakshatriya commented 4 years ago

@neoblizz I pulled the latest master branch and tried recompiling gunrock, but I am getting a lot of cmake errors on the lines of -
CMake Error at ~/scratch/cmake-3.17.2-Linux-x86_64/share/cmake-3.17/Modules/FindCUDA.cmake:1837 (add_library):
  Cannot find source file:

    ......./gunrockv1.1/externals/moderngpu/src/context.hxx

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm
  .hpp .hxx .in .txx
Call Stack (most recent call first):
  gunrock/CMakeLists.txt:47 (CUDA_ADD_LIBRARY)
There is one such error for each binary too including pr. I am using cmake 3.17.2 (which is what I was using previously for 7c197d6 too and it compiled fine). Is this a transient issue with the latest commit? Is there a stable commit from before that I can use?
Seems like you didn't fetch the submodules again since we now use moderngpu 2.0 instead of 1.0.
git submodule update --init

I just pulled the latest version of gunrock again (with --recursive) and it seemed to have fixed the problem. I am running the experiments right now for PageRank. Please let me know if you want the flags for any other experiments (that we had originally used in the paper).

crozhon commented 4 years ago

I just fixed some issues with using multiple options at once (--pull=false,true) and an issue with OpenMP that is only on dev. If you have another issue try checking that out.

jowens commented 4 years ago

Simply for our future use, it would be nice to have the settings for everything you tested, yes. But the primitive-specific settings are specifically what we'll use straightaway.

jowens commented 4 years ago

@AjayBrahmakshatriya any set of settings that you can send us for your runs that were faster than ours?

AjayBrahmakshatriya commented 4 years ago

I just checked the logs from our experiments and the only set of experiments where our numbers were faster than your reported numbers are for BFS for the following two graphs. I have specified the command line arguments and the reported average time here. For the rest your reported numbers were faster and with your mentioned flags we were able to match those.

I just reran these experiments to make sure there isn't a mix up in the flags. These experiments are run on a DGX1 with a V-100 (single GPU)

Graph	Command line	Gunrock reported	Our experiments
soc-orkut	./bfs market soc-orkut.mtx --direction-optimized --do-a=0.0 12 --do-b=0.012 --src=0 --num-runs=10 --quick --device=5	3.986	1.680
soc-twitter-2010	./bfs market soc-twitter-2010.mtx --direction-optimized --d o-a=0.003 --do-b=0.003 --src=0 --num-runs=10 --quick --device=5	15.432	13.65

jowens commented 4 years ago

thank you @AjayBrahmakshatriya!

gunrock / gunrock

request for Volta numbers on 9 datasets #751

BC

BFS

SSSP (weights 0-1000)