gunrock / io

Input (scripts, etc.) and output (scripts, performance results, etc.) for Gunrock and other graph engines
10 stars 6 forks source link

json files with no graph problem (graph building/updating benchmarks) #65

Closed maawad closed 4 years ago

maawad commented 4 years ago

I have a sample of a json file that I output after performing a graph bulk build operation below (I will have similar ones for other graph update operations). Gunrock computes avg-process-time in a function that assumes there is a graph problem running (it requires an Enactor to compute statistics) so I can't really call that but I can do a PR to change it if needed. So the main fields in the output json file that I care about/correct are: load-factor: float, undirected: bool, process-times: array of floats and dataset: string. There will be an additional batch-size: uint for other update operations as well.

Question is: Do you need the avg-process-time field to be correct? or can we use the process-times array? We will be computing average edge insertion/update throughput.

{
    "64bit-SizeT": false,
    "64bit-ValueT": false,
    "64bit-VertexT": false,
    "avg-mteps": 0.0,
    "avg-process-time": 4.4e-323,
    "binary-prefix": "",
    "command-line": "./bin/dynamic_graph_main_10.1_x86_64 market ../../dataset/large//ak2010/ak2010.mtx --undirected --num-runs=10 --validation=each --jsondir=./eval/ --load-factor=0.2",
    "compiler": "Gnu GCC C/C++",
    "compiler-version": 70400000,
    "dataset": "ak2010",
    "edge-value-min": 0.0,
    "edge-value-range": 64.0,
    "edge-value-seed": 0,
    "engine": "Gunrock",
    "filtered-process-times": [],
    "git-commit-sha": "64931017251f057c384f4e4c6f2efd7ada1a7e5d",
    "gpuinfo": {
        "name": "GeForce RTX 2080",
        "total_global_mem": 8335327232,
        "major": "7",
        "minor": "5",
        "clock_rate": 1710000,
        "multi_processor_count": 46,
        "driver_api": "10010",
        "driver_version": "10010",
        "runtime_version": "10010",
        "compute_version": "75"
    },
    "graph-edgefactor": 48.0,
    "graph-edges": 49152,
    "graph-file": "../../dataset/large//ak2010/ak2010.mtx",
    "graph-nodes": 1024,
    "graph-scale": 10,
    "graph-seed": 0,
    "graph-type": "market",
    "grmat": false,
    "gunrock-version": "1.1.0",
    "help": false,
    "json": false,
    "json-schema": "2019-09-20",
    "jsondir": "./eval/",
    "jsonfile": "",
    "load-factor": 0.20000000298023225,
    "load-time": 25.533199310302736,
    "max-mteps": 0.0,
    "max-process-time": 4.4e-323,
    "min-mteps": 0.0,
    "min-process-time": 4.4e-323,
    "num-edges": 217098,
    "num-runs": 10,
    "num-vertices": 45292,
    "postprocess-time": 0.1709461212158203,
    "preprocess-time": 0.0,
    "primitive": "DynamicGraphBuilding",
    "process-times": [
        0.4641599953174591,
        0.45494401454925539,
        0.3834879994392395,
        0.38915199041366579,
        0.3860799968242645,
        0.5951679944992065,
        0.3871679902076721,
        0.3845759928226471,
        0.3933440148830414,
        0.3943040072917938
    ],
    "quick": false,
    "quiet": false,
    "random-edge-values": false,
    "read-from-binary": true,
    "remove-duplicate-edges": true,
    "remove-self-loops": true,
    "rgg-thfactor": 0.55,
    "rgg-threshold": 0.0,
    "rmat-a": 0.57,
    "rmat-b": 0.19,
    "rmat-c": 0.19,
    "rmat-d": 0.05,
    "small-world-k": 6,
    "small-world-p": 0.0,
    "sort-csr": false,
    "srcs": 0,
    "stddev-degree": 5.749463081359863,
    "stddev-process-time": 0.0,
    "store-to-binary": true,
    "sysinfo": {
        "sysname": "Linux",
        "release": "4.15.0-58-generic",
        "version": "#64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019",
        "machine": "x86_64",
        "nodename": "home"
    },
    "tag": [
        ""
    ],
    "time": "Sat Jun 13 07:26:13 2020\n",
    "total-time": 1139.1291618347169,
    "undirected": true,
    "userinfo": {
        "login": "muhammad"
    },
    "v": false,
    "validation": "each",
    "vertex-start-from-zero": true
}
jowens commented 4 years ago

I can generate avg-process-time easily but I'd like @neoblizz to chime in simply because this is gonna be confusing for any future person who ever wants to benchmark/compare against our stuff. One option might be moving the stats-compute code out of the enactor (there's no proximate reason why it should be there) and into some sort of standalone postprocess call.

jowens commented 4 years ago

Also, surprising amount of variability in process-times.

neoblizz commented 4 years ago

Question is: Do you need the avg-process-time field to be correct? or can we use the process-times array? We will be computing average edge insertion/update throughput.

No, we don't need that field to be correct. I think as long as the methodology for our measurements is clearly defined we are fine. Use what you care about.

In the refactored version, the JSON class is separate from the actual statistics you collect (so, a stats class and a JSON class), which means each algorithm, application, or test gets to define what stats it needs to show in its JSON output.

maawad commented 4 years ago

Okay I can do that for now then and avoid modifying that file :)

Also, surprising amount of variability in process-times.

Numbers are not exactly correct here. I was inserting edges twice. Will double check after fixing that.