tweag / nixpkgs-graph-explorer

Explore the nixpkgs dependency graph
MIT License
15 stars 0 forks source link

Update default Gremlin Query #78

Closed jlesquembre closed 1 year ago

jlesquembre commented 1 year ago

As @dorranh noticed, the Gremlin queries defined on the UI are not correct. There are 2 queries, defined here: https://github.com/tweag/nixpkgs-graph-explorer/blob/6d823bb1399b54bdb01283e7838653ebdaf479b7/web/src/api.ts#L59 and here: https://github.com/tweag/nixpkgs-graph-explorer/blob/6d823bb1399b54bdb01283e7838653ebdaf479b7/web/src/code-editor.ts#L41-L54

A better query could be

g.V()
    .has(
    'package',
    'pname',
    'A'  // Package name goes here
    )
.repeat(outE().otherV().simplePath())
.until(
    outE().count().is(0)
)
.path()
.by('pname')
.by('label')

or

g.V()
    .has(
    'package',
    'pname',
    'A'  // Package name goes here
    )
.repeat(outE().otherV().simplePath())
.until(
    outE().count().is(0)
)
.path()
.by('pname')
.by('label')
.limit(100)
dorranh commented 1 year ago

Perhaps one idea would be to execute two queries when an item is selected in the table view:

  1. The actual query for getting the graph data and
  2. A query which checks the total size of the graph data

(1) Could be set to a pre-configured limit (e.g. 100 or 200) then (2) could trigger a warning to be displayed if the return value is greater than this limit.

For ad-hoc queries I think the best option would be to still have overly complex queries trigger a timeout, but perhaps to render a more user friendly error message when this occurs.

dorranh commented 1 year ago

Another alternative would be to implement a custom api endpoint for fetching package graphs specifically. This could then handle both fetching the data and issuing warnings as required.

GuillaumeDesforges commented 1 year ago

@dorranh is it fixed now in v1.0?

dorranh commented 1 year ago

@GuillaumeDesforges, nope not yet. By the way, an alternative to the above suggestions is as follows:

g.V()
    .has(
    'package',
    'pname',
    'tensorflow'  // Package name goes here
    )
.repeat(outE().otherV().simplePath())
.until(
    outE().count().is(0).or().loops().is(gte(2)) // The value in gte() limits the depth of the traversal
)
.path()
.by('pname')
.by('label')
.limit(20) // This limits the number of "paths" returned (where a path = [vertex, edge, vertex])

This will attempt to fully explore the graph while allowing for a bit more configurability. Increasing the gte() clause to perform more of a depth-first traversal at the cost of needing to potentially limit the total number of paths returned, and you can do the opposite to perform more of a breadth-first traversal.

This could then get integrated in the UI via a slider or something similar.