Bug: Graph query does not work when selecting uncreated node

yeonsy commented 8 months ago

Describe the bug

Hi, I have a table that should be sparse (most nodes do not need to be created), and I want to retrieve the node id directly from the relation. But when I do a graph traversal, it returns zero results because the node does not exist.

For example:

CREATE word:quick;
RELATE word:quick->synonym->word:fast;
SELECT *, ->synonym.out FROM word:quick; // finds the synonym relation
SELECT *, <-synonym.in FROM word:fast; // looks like it should be equivalent of above, but returns no results

CREATE word:fast;
SELECT *, <-synonym.in FROM word:fast; // now it is working

It is confusing that the select works in one direction, but does not work in the inverse direction which is logically the same.

Some possible workarounds:

Fill in all the nodes of the table, this will work but will add a lot of unnecessary size.
SELECT * FROM synonym WHERE out = "word:fast". This works but gives up on the nice graph query syntax. Consider if you want to find synonym->synonym->synonym, multiple indirections will quickly become cumbersome.
word:fast<-synonym<-word. This query works and return word:quick as desired but now the other direction doesn't work. word:quick->synonym->word will return an empty array, and it's confusing that to get the node id in the forward direction I have to use SELECT ->synonym.out FROM word:quick while to get it in the reverse direction I have to use word:fast<-synonym<-word.

I think ideally the expected behavior would be:

When listing relations on a node that hasn't been created, fallback to looking up relations on the edges table

Steps to reproduce

See above

Expected behaviour

See above

SurrealDB version

1.3.0 for linux on x86_64

Contact Details

No response

Is there an existing issue for this?

[X] I have searched the existing issues

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Dhghomon commented 8 months ago

Hi @yeonsy,

The two are slightly different. A way to understand this behaviour is by looking at the graph edge itself with SELECT * FROM synonym; which gives this:

[
    {
        "id": "synonym:mds39ahixqzcz590cwi6",
        "in": "word:quick",
        "out": "word:fast"
    }
]

The first query SELECT *, ->synonym.out FROM word:quick; starts from word:quick to see if it is an in inside synonym, which in this case is true, then ends with synonym.out which is word:fast. The record word:fast doesn't exist yet but the relation is set up to look for this ID so it shows up in the query. (In other words, the query starts from an existing record to an existing graph edge, and doesn't ever try to find the record at out)

The second query SELECT *, <-synonym.in FROM word:fast; starts from word:fast...but that's the end of the query as it is starting from a record that doesn't exist yet.

If you give SELECT ->synonym.out.id FROM word:quick; a try you can see something that follows the order of the first query but gives a similar result to the second because it actually tries to access the uncreated word:fast, and now returns null.

As an aside, thank you for raising the subject as I am putting together some material about just this sort of behaviour and I think this will make for a good example!

yeonsy commented 7 months ago

Thanks for the detailed explanation @Dhghomon. I just have one question, could you please explain how word:fast<-synonym<-word query is working in this situation? It's not clear to me how it differs from a SELECT query,

Other than that, is there a suggested method to use sparse graphs in SurrealDB? I would rather avoid creating a lot of empty records just to get graph traversals working.

surrealdb / surrealdb