ldbc / ldbc_finbench_docs

Specification of the LDBC Financial Benchmark
https://ldbcouncil.org/ldbc_finbench_docs/ldbc-finbench-specification.pdf
Apache License 2.0
17 stars 6 forks source link

Batched issues from Gabor #65

Closed qishipengqsp closed 1 year ago

qishipengqsp commented 1 year ago

General comments

Comments for specific queries/operations

complex-read / 2:

complex-read / 5:

complex-read / 7:

complex-read / 8:

complex-read / 10:

complex-read / 12:

simple-read / 2: in the pattern:

simple-read / 3:

simple-read / 6:

simple-read / 6:

write / 3, 8, 9:

simple-read / 6, 7, 8:

write / 8:

Typo

The word "gurantee" is misspelled in the patterns of read-write / 3 and write / 13 (maybe also in other places).

currentTime

Several queries use the "currentTime" parameter. Where does the value of this parameter come from: the data set, the driver (e.g. Java's System.currentTimeMillis()), or the DBMS (e.g. Cypher's date() function)? If it's a parameter, the patterns should indicate it as such with ${currentTime}.

Path queries with conditions over multiple edges

Queries that require "all timestamps in the transfer trace are in ascending order" or the "upstream" edge are difficult to explain in plain Cypher (or GQL or SQL/PGQ) because they require some memory.

One can think of this as an extra choke point which requires support for the category of queries "Regular expression with memory" as described in this paper: https://dl.acm.org/doi/abs/10.1145/2274576.2274585. Even if it's not defined as a choke point, it's worth including this reference in the specificiation.

More comments

qishipengqsp commented 1 year ago

General comments

  • start_time/startTime and end_time/endTime are used inconsistently both in the patterns and in the text

Done

  • the name for the ids for persons p1 (and p2) is inconsitent: "p1Id", "pid1" are used.
  • patterns: when having multiple nodes on one end of an edge, it is worth assigning different names to these nodes good examples:

    • complex-read / 6: has src1, ..., src3
    • complex-read / 11: has l1c1, l1c2, ..., l1c3, and so on

    bad examples:

    • simple-read / 2: has dst, dst, dst
    • simple-read / 3: has src, src, ...
    • complex-read / 7: has src/dst repeated
    • complex-read / 8: has dst1, dst1, ..., dst1; dst2, ...
    • complex-read / 9: has up/down repeated
    • complex-read / 10: has m repeated

Done

Comments for specific queries/operations

complex-read / 2:

  • "accounts" should be singular (account)

Done

complex-read / 5:

  • if every row in the result encodes a path, the name of the result attribute should be "path" and its description should be "a transfer trace"

Done

complex-read / 7:

  • numSrc, numDst: are these numbers counted with 'count distinct'?

Done

complex-read / 8:

  • the term "inflow" is not defined and it is only used for describing the results
  • the pattern should illustrate/describe that the traversal can end at any level (dst1, dst2, dst3)

Done

complex-read / 10:

  • what should be the result if there are no matches for edge1 and/or edge2?

Done

complex-read / 12:

  • nitpicking but if everything else query used the terms edge1, edge2, ...; why use g1, g2, ..., gN here?
  • a person can have multiple loans, maybe this should be reflected in the pattern as well?

Done

simple-read / 2: in the pattern:

  • the undirected "edge2" in the middle does not make sense to me
  • the second "edge1" should be called "edge2"
  • in the pattern, "sum(edge1.amount)" in the last row should be "sum(edge2.amount)"

Done

simple-read / 3:

  • what should be returned if there are no edges between src and dst and the result would be 0/0?

Done

simple-read / 6:

  • result description doesn't make sense. accounts on their own do not have srcs, their incoming transfer edges do.

Done

simple-read / 6:

  • the RESULT in the pattern should say "collect(x.id)" instead of "collect(x)"

Done

write / 3, 8, 9:

  • the edges (transfer, deposit, repay) edges are missing the plus sign indicating that they are newly created

Done

simple-read / 6, 7, 8:

  • the name of result fields that return the collection of multiple IDs should have a plural name, e.g. "accId" should be called "accIds"

Done

write / 8:

  • the node "Loan" should be named "loan: Loan" as its bound to a single node by the loanId parameter
  • "amount <- amount" in the pattern is inconsistent, other patterns use "amount <- amt"

Done

Typo

The word "gurantee" is misspelled in the patterns of read-write / 3 and write / 13 (maybe also in other places).

Done

qishipengqsp commented 1 year ago

TODO: re-generate and crop the diagrams.