Unverified input and cache graph data.

ldbc / ldbc_graphalytics

Generic driver for LDBC Graphalytics implementation

https://ldbcouncil.org/benchmarks/graphalytics/

Apache License 2.0

78 stars 35 forks source link

Unverified input and cache graph data. #133

Open wlngai opened 7 years ago

wlngai commented 7 years ago

The current Graphalytics assumes that the input graph and (cache) graph are correct, which is fine as if it is not then the corresponding benchmark runs will not pass validation. However, it is unclear to users why the validation failed, as they assume the input graph and (cache) graph are correct. These datasets can be accidentally corrupted for example when the caching process was interrupted.

A check-sum (e.g. sha1) should be implemented on these files for full validation.

szarnyasg commented 1 year ago

I just ran into this issue 5.5 years later :). If the program is interrupted during the cache file's generation, it will leave a partial file and the next execution will assume it is correct even when the number of rows is different between the two files:

$ wc -l /data/gx/graphs/cache/datagen-7_5-fb.e
30759439 /data/gx/graphs/cache/datagen-7_5-fb.e

$ wc -l /data/gx/graphs/datagen-7_5-fb.e
34185747 /data/gx/graphs/datagen-7_5-fb.e