rapidsai / cuxfilter

GPU accelerated cross filtering with cuDF.
https://docs.rapids.ai/api/cuxfilter/stable/
Apache License 2.0
278 stars 68 forks source link

Error when using a string column as node_id in cuxfilter graph #399

Closed abhijit156 closed 4 months ago

abhijit156 commented 2 years ago

I have successfully built node and edge list, a cuGraph object, and visualized the network in an interactive dashboard. However, I would like the node labels to be strings, as opposed to a numerical node ID. I tried this on my own data as well as Divvy Chicago bikeshare dataset as demonstrated in the cuxfilter tutorial. Intuitively, this is the command I tried running on the bikeshare tutorial data -

`chart1 = cuxfilter.charts.graph( node_id='from_station_name', edge_source='src', edge_target='dst', node_aggregate_fn='count', node_pixel_shade_type='linear', node_point_size=35, #node size is fixed set edge_render_type='direct', #direct, curved edge_transparency=0.7, #0.1 - 0.9

tile_provider='CARTODBPOSITRON',

        #title='Graph for trip source_stations (color by count)
    )`

Following this, when I run d = cux_df.dashboard([chart1], layout=cuxfilter.layouts.single_feature, theme=cuxfilter.themes.rapids, title='Geospatial Trips')

I am getting ValueError: Could not convert strings to float type due to presence of non-floating values.

The only way to avoid this error seems to be providing a numerical column for node_id. This is especially confusing because the cuxfilter charts documentation here, shows the node_id as having the data type str, with default value 'vertex'

Any help would be greatly appreciated!!

AjayThorve commented 2 years ago

Hey @abhijit156 thanks for using cuxfilter.

As of now, for the vertex, only number type(int, float) data types are supported.

In the documentation, the data type str for vertex refers to the type of argument the function accepts, which has to be a str name of column.

Adding string should definitely be possible, but I don't think it could be a blocker for you, a workaround could be just create another column and assign a unique integer to each of the string labels.

abhijit156 commented 2 years ago

Thanks for the quick response, Ajay!

I do have unique numerical IDs assigned to each phrase already. I am having trouble when i try to use those phrases as node labels in the graph visualization, (for example if using hover for info?) instead of the numerical IDs.

Is there a workaround to this?

AjayThorve commented 2 years ago

I don't think cuxfilter natively supports either of those scenarios, node labels, or hover for info for graph viz. One of the things you can do is, use the table view chart, and then box select the node you want the info on. The corresponding info (including your string labels) would be displayed in that chart.

Here is an example:

import cuxfilter
import cudf

edges = cudf.DataFrame({
    'source': [0, 0, 0, 0, 1, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0],
    'target': [1, 2, 3, 1, 2, 3, 3, 2, 2, 3, 3, 3, 3, 3, 3]
})

nodes = cudf.DataFrame({
    'vertex': [0, 1, 2, 3],
    'vertexStr': ["a", "b","c", "d"],
    'x': [-3.3125157356262207,-1.8728941679000854, 0.9095478653907776, 1.9572150707244873],
    'y': [-1.6965408325195312, 2.470950126647949,-2.969928503036499,0.998791515827179]
})

cux_df = cuxfilter.DataFrame.load_graph((nodes, edges))

chart0 = cuxfilter.charts.datashader.graph(node_pixel_shade_type='linear', unselected_alpha=0.2)
chart1 = cuxfilter.charts.view_dataframe(['vertexStr','vertex'])

d = cux_df.dashboard([chart0, chart1], layout=cuxfilter.layouts.double_feature)
d.app()

image

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.