ricklupton / ipysankeywidget

IPython / Jupyter Sankey diagram widget
MIT License
174 stars 24 forks source link

flow_selection with 'source.some_key' doesn't work because of pandas.eval() #41

Closed mskoh52 closed 5 years ago

mskoh52 commented 5 years ago

When creating a dataset with dim_process (and also probably with dim_material and dim_time, although I didn't try), there is a problem if you try to later use the flow_selection param when creating a Bundle. Because the dataset is constructed by adding a prefix source. and target. (note the dot), pandas gets angry whenever floweaver.dataset.eval_selection gets called.

For example, if I have created a Dataset like so:

import pandas as pd
from floweaver import *
flows = pd.DataFrame(
    [['a', 'b', 10],
     ['a', 'c', 20],
     ['b', 'b', 5],
     ['b', 'd', 5],
     ['c', 'd', 20]],
    columns=['source', 'target', 'value']
)
processes = pd.DataFrame(
    ['fooA', 'fooB', 'fooC', 'fooD'],
    columns=['foo'],
    index=['a', 'b', 'c', 'd']
)

dataset = Dataset(flows, dim_process=processes)

nodes = {
    'node1': ProcessGroup(['a']),
    'node2': ProcessGroup(['b', 'c']),
    'node3': ProcessGroup(['d']),
    'wp': Waypoint(direction='R')
}
ordering = [['node1'], ['node2', 'wp'], ['node3']]
bundles = [
    Bundle('node1', 'node2'),
    Bundle('node2', 'node2', flow_selection='source.foo == "fooB"', waypoints=['wp']),
    Bundle('node2', 'node3'),
]

sdd = SankeyDefinition(nodes, bundles, ordering)

weave(sdd, dataset).to_widget()

This fails with the following traceback:

AttributeError                            Traceback (most recent call last)
<ipython-input-6-04c4391e4f95> in <module>
     33 sdd = SankeyDefinition(nodes, bundles, ordering)
     34 
---> 35 weave(sdd, dataset).to_widget()

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/weave.py in weave(sankey_definition, dataset, measures, link_width, link_color, palette)
     43     # Get the flows selected by the bundles
     44     bundle_flows, unused_flows = dataset.apply_view(
---> 45         sankey_definition.nodes, bundles2, sankey_definition.flow_selection)
     46 
     47     # Calculate the results graph (actual Sankey data)

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in apply_view(self, process_groups, bundles, flow_selection)
     88 
     89     def apply_view(self, process_groups, bundles, flow_selection=None):
---> 90         return _apply_view(self, process_groups, bundles, flow_selection)
     91 
     92     def save(self, filename):

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in _apply_view(dataset, process_groups, bundles, flow_selection)
    191         target = process_groups[bundle.target]
    192         flows, internal_source, internal_target = \
--> 193             find_flows(table, source.selection, target.selection, bundle.flow_selection)
    194         assert len(used_edges.intersection(
    195             flows.index.values)) == 0, 'duplicate bundle'

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in find_flows(flows, source_query, target_query, flow_query, ignore_edges)
    136     """
    137     if flow_query is not None:
--> 138         flows = flows[eval_selection(flows, '', flow_query)]
    139 
    140     if source_query is None and target_query is None:

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in eval_selection(df, column, sel)
     38                        local_dict={},
     39                        global_dict={},
---> 40                        resolvers=(resolver, ))
     41     else:
     42         raise TypeError('Unknown selection type: %s' % type(sel))

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
   3191             kwargs['target'] = self
   3192         kwargs['resolvers'] = kwargs.get('resolvers', ()) + tuple(resolvers)
-> 3193         return _eval(expr, inplace=inplace, **kwargs)
   3194 
   3195     def select_dtypes(self, include=None, exclude=None):

.....
{removed for brevity}
.....

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/pandas/core/computation/expr.py in visit_Attribute(self, node, **kwargs)
    548 
    549         raise ValueError("Invalid Attribute context {name}"
--> 550                          .format(name=ctx.__name__))
    551 
    552     def visit_Call_35(self, node, side=None, **kwargs):

AttributeError: 'Load' object has no attribute '__name__'

Full traceback here: https://gist.github.com/mskoh52/7054199865c214ad1de8f0e4772582d4

I was able to solve this by editing dataset.py and replacing all the dots with underscores on lines 28 and 70-77, then replacing flow_selection string with underscores as well. Can submit a PR if desired, but I might be missing some other places where the dot is important and not realize it (haven't looked too extensively at the rest of the code)

mskoh52 commented 5 years ago

Sorry...I posted this in the wrong repo. It should be in the rickupton/floweaver repo