ldbc / ldbc_snb_interactive_v1_driver

Driver for the LDBC SNB Interactive workload
https://ldbcouncil.org/benchmarks/snb-interactive
Apache License 2.0
19 stars 35 forks source link

Operation counts not consistent across benchmarks #118

Open xwkuang5 opened 4 years ago

xwkuang5 commented 4 years ago

Hi,

I am reposting an open issue in the ldbc_snb_implementations repo here.

I am trying to use the cypher benchmark to evaluate the performance of Neo4j under different configurations. I set operation_count=2500 and run interactive-benchmark.sh script multiple times. However, I was getting three different final operations counts (2473, 2532, 2584) across 3 different runs. Is this the expected result?

Thanks for any help in advance!

Here is my configuration

endpoint=bolt://localhost:7687
user=neo4j
password=admin
queryDir=queries/
printQueryNames=false
printQueryStrings=false
printQueryResults=false

status=1
thread_count=2
name=LDBC-SNB
results_log=true
time_unit=MILLISECONDS
time_compression_ratio=0.001
peer_identifiers=
workload_statistics=false
spinner_wait_duration=1
help=false
ignore_scheduled_start_times=true

workload=com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbInteractiveWorkload
db=com.ldbc.impls.workloads.ldbc.snb.cypher.interactive.CypherInteractiveDb
operation_count=2500
ldbc.snb.interactive.parameters_dir=../../ldbc_snb_datagen/substitution_parameters/
ldbc.snb.interactive.updates_dir=../../ldbc_snb_datagen/social_network/
ldbc.snb.interactive.short_read_dissipation=0.2
ldbc.snb.interactive.update_interleave=49274

warmup=100

## frequency of read queries (number of update queries per one read query)
ldbc.snb.interactive.LdbcQuery1_freq=26
ldbc.snb.interactive.LdbcQuery2_freq=37
ldbc.snb.interactive.LdbcQuery3_freq=123
ldbc.snb.interactive.LdbcQuery4_freq=36
ldbc.snb.interactive.LdbcQuery5_freq=78
ldbc.snb.interactive.LdbcQuery6_freq=434
ldbc.snb.interactive.LdbcQuery7_freq=38
ldbc.snb.interactive.LdbcQuery8_freq=5
ldbc.snb.interactive.LdbcQuery9_freq=527
ldbc.snb.interactive.LdbcQuery10_freq=40
ldbc.snb.interactive.LdbcQuery11_freq=22
ldbc.snb.interactive.LdbcQuery12_freq=44
ldbc.snb.interactive.LdbcQuery13_freq=19
ldbc.snb.interactive.LdbcQuery14_freq=49

# *** For debugging purposes ***

ldbc.snb.interactive.LdbcQuery1_enable=true
ldbc.snb.interactive.LdbcQuery2_enable=true
ldbc.snb.interactive.LdbcQuery3_enable=true
ldbc.snb.interactive.LdbcQuery4_enable=true
ldbc.snb.interactive.LdbcQuery5_enable=true
ldbc.snb.interactive.LdbcQuery6_enable=true
ldbc.snb.interactive.LdbcQuery7_enable=true
ldbc.snb.interactive.LdbcQuery8_enable=true
ldbc.snb.interactive.LdbcQuery9_enable=true
ldbc.snb.interactive.LdbcQuery10_enable=true
ldbc.snb.interactive.LdbcQuery11_enable=true
ldbc.snb.interactive.LdbcQuery12_enable=true
ldbc.snb.interactive.LdbcQuery13_enable=true
ldbc.snb.interactive.LdbcQuery14_enable=true

ldbc.snb.interactive.LdbcShortQuery1PersonProfile_enable=true
ldbc.snb.interactive.LdbcShortQuery2PersonPosts_enable=true
ldbc.snb.interactive.LdbcShortQuery3PersonFriends_enable=true
ldbc.snb.interactive.LdbcShortQuery4MessageContent_enable=true
ldbc.snb.interactive.LdbcShortQuery5MessageCreator_enable=true
ldbc.snb.interactive.LdbcShortQuery6MessageForum_enable=true
ldbc.snb.interactive.LdbcShortQuery7MessageReplies_enable=true

ldbc.snb.interactive.LdbcUpdate1AddPerson_enable=true
ldbc.snb.interactive.LdbcUpdate2AddPostLike_enable=true
ldbc.snb.interactive.LdbcUpdate3AddCommentLike_enable=true
ldbc.snb.interactive.LdbcUpdate4AddForum_enable=true
ldbc.snb.interactive.LdbcUpdate5AddForumMembership_enable=true
ldbc.snb.interactive.LdbcUpdate6AddPost_enable=true
ldbc.snb.interactive.LdbcUpdate7AddComment_enable=true
ldbc.snb.interactive.LdbcUpdate8AddFriendship_enable=true
xwkuang5 commented 4 years ago

If I understand short_read_dissipation correctly, it is the delta in the random walk model. Larger short_read_dissipation means a shorter walk, e.g., in the extreme case where short_read_dissipation=1, there should be no short reads after the complex read. Is this the reason why the number of operations can be different across different runs at the end?

xwkuang5 commented 4 years ago

If the above is true, is there a way to set the random seed in the test driver to make sure that the workload of a particular benchmark can be replayed?

jackwaudby commented 4 years ago

Hi @xwkuang5

Sorry for the delay in replying.

I was getting three different final operations counts (2473, 2532, 2584) across 3 different runs. Is this the expected result? I will discuss this with task force when we talk next

I've just ran the cypher implementation a few times with your configuration and can reproduce the issue. Which scale factor are you using to generate the data?

Best,

Jack

xwkuang5 commented 4 years ago

Hi Jack, thanks for your reply

I believed it's SF1 (or SF3)