ldbc / ldbc_snb_interactive_v1_impls

Reference implementations for LDBC Social Network Benchmark's Interactive workload.
https://ldbcouncil.org/benchmarks/snb-interactive
Apache License 2.0
97 stars 85 forks source link

Does this project have a timeout setting #393

Closed Ask-sola closed 1 year ago

Ask-sola commented 1 year ago

I am running the Cypher implementation of SF100, and one of the queries has been running continuously for nearly an hour. This is a worthless query, and I don't know how long it will take to run and I am unable to obtain the test results that have been available in the past. I would like to know if the logic of this implementation has set a maximum query time to interrupt an overly long query? image

szarnyasg commented 1 year ago

@Ask-sola the Cypher implementation is a reference implementation, i.e. it is not optimized for performance. We only used it for cross-validating other implementation on scale factors up to SF10. So it is quite possible that some of the Cypher implementation's queries, especially Q14 (which uses path-finding), will be very slow on larger data sets. For version 1, I recommend using either one of the audited implementations. For version 2 (which has deletes and scales up to SF10,000+), we do not have a fast and scalable implementation yet.

Ask-sola commented 1 year ago

Uh, what does version 1 mean? Can I understand that version 1 corresponds to v1 dev, while version 2 corresponds to the mater branch?

szarnyasg commented 1 year ago

That's right. v1 is stable and used in audits, and v2 is a work-in-progress with the queries/updates stabilized but the workload mix/parameters still under tuning.

Ask-sola commented 1 year ago

Is there a significant difference in the data or query statements between V1 and V2? I tested the implementation of Cypher SF100 and found that the speed of Query1 is much faster than some previous disclosure results。

Ask-sola commented 1 year ago

Compare with the results here“ https://arxiv.org/pdf/1907.07405.pdf ”You can see that its query1 is about 1000s or more, while when I ran the test, it only took 100ms, which even made me feel if it was the same IC1

szarnyasg commented 1 year ago

@Ask-sola

Is there a significant difference in the data or query statements between V1 and V2? There is no significant difference in the specification of data/queries but the implementations have been reworked considerably.

The preprint you cite has a few issues. These were documented in a report commissioned by LDBC.

Besides these issues, the preprint is more than 4 years old. During this time the Neo4j system, the hardware available, and the SNB Interactive implementations all improved.

So if you run the benchmark correctly, and get correct results (e.g. cross-validation passes against the reference output), then the performance number you see are correct.

Ask-sola commented 1 year ago

Thank you very much for your answer. I have already tested using V1 version. Also, I would like to ask about the concept of cross validation. Taking Neo4j as an example, it seems that creating validation parameters is still based on Neo4j (this is because I found that inserting and deleting Neo4j during the validation process caused changes to the database, and I did not read the internal source code, so I am not very sure), but still using these parameters to validate Neo4j does not seem to be a useful idea, which seems similar to the candidate giving themselves an exam paper. If I need to perform cross validation on a new database (if named T), should I create validation parameters through the implementation of neo4j and use the resulting file for T validation? Or should I directly create an implementation subproject suitable for T and first perform the creation of validation parameters in this subproject and use them for T's validation?

szarnyasg commented 1 year ago

Yes, you'll need two different systems for cross-validation. For example, you can generate the reference output with Neo4j (using the create-validation-parameters.sh script) and cross-validate database T (using the cross-validate.sh script).