graphistry / pygraphistry

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
BSD 3-Clause "New" or "Revised" License
2.14k stars 204 forks source link

[BUG] - GFQL #583

Closed SinsBre closed 1 week ago

SinsBre commented 3 weeks ago

Describe the bug

GFQL not filtering on degrees

To Reproduce Code, including data, than can be run without editing:

g = graphistry.nodes(nodes).bind(node="Vertex")
                   .edges(edges).bind(source='Vertex 1', destination='Vertex 2')
         )

g_gfql = g_gqfl_1.get_degrees().nodes(lambda g: g._nodes.reset_index(drop=True)).edges(lambda g: g._edges.reset_index(drop=True))
          .chain([
              n({'degree': gt(1)}),
              e_undirected(),
              n({'degree': gt(1)})
          ])

Expected behavior Should only return nodes with degree gt 1

Actual behavior It is not filtering out based on degree.

Graphistry GPU server environment

AWS - 2.40.74

Additional context

Contact for datasample

lmeyerov commented 1 week ago

Some minimal reproductions

Preamble

import graphistry
import pandas as pd
from graphistry import n, e_forward, e_reverse, e_undirected, is_in, gt

1.

edf = pd.DataFrame({
    's': ['a1', 'b3', 'b3'],
    'd': ['b3', 'b3', 'c1']
})
g = graphistry.edges(edf, 's', 'd').materialize_nodes().get_degrees()

where

    id  degree_in   degree_out  degree
0   a1  0   1   1
1   b3  2   2   4
2   c1  1   0   1

Then unexpected output:

g2 = (g.get_degrees()
  .chain([
      n({'degree': gt(1)}),
      e_undirected(),
      n({'degree': gt(1)})
  ])
)

=>

    id  degree  degree_in   degree_out
0   b3  4   2   2
1   a1  1   0   1

We don't expect edge (a1)->(b3) as (a1 {degree: 1})

    s   d
0   a1  b3
1   b3  b3
lmeyerov commented 1 week ago

@SinsBre published as 0.34.5 -- lmk if still happening

I added a test case on a minimal dataset with the same basic query structure, among others: https://github.com/graphistry/pygraphistry/blob/ecea21df3fff4bd4fadcf7f51add1c45ac0df3d7/graphistry/tests/compute/test_chain.py#L383

the bigger dataset may have more interesting shapes going on, so am curious