Closed Anna-Xiong closed 3 weeks ago
I like it! I think this should have the exact same API as the plot
function, and it does!
A couple of ideas for suggestions:
get_plot_dict
?Other than that, think this is great 💪 let's do it
Will do a final review once you finalised it, thanks!
Thank you!
- maybe tune the name so we know the return type:
get_plot_dict
?
I am leaning towards including output in the name, probably get_plot_output_dict if it is not too long?
- Warning is nice, but I think not even necessary, could even skip it
Here we are using INFO instead of WARNING, it would be nice to notify user they haven't turned on explain and won't be able to get business insights and code explanation.
Great review from Antony, and on a more thorough review I agree with some but not all points. Let me try to summarize:
I disagree that we should return a dataclass
for plot
- a core idea for simplicity is that plot
returns a figure, and in jupyter that figure is automatically displayed.
Yes, I 100% agree and I am annoyed I didn't think of it before given that this is exactly what is done in DS tools. So for the component collection, yes 100% dataclass.
Does not feel great, but I think here it makes sense, because .plot
is a very special function... It returns a fig object for simplicity (and that should remain as is, as in jupyter it shows the fig automatically and it is the core functionality), but in a jupyter environment, it also shows the insights and code snippet above the fig. That is potentially what is most confusing, but at the same time probably the best way to handle a complicated situation.
Thus, the second function with the same function signature was intended to return the same (not a subset) things, but in a way that is simple to further use (e.g. if you would like to just have the code string, or if you would like to return the insights separately in a chatbot).
Here is the 4 combinations, and what I think it should do:
In Jupter:
plot(...,explain=False)
-> Return fig object, that is automatically displayedplot(...,explain=True)
-> Return fig object, that is automatically displayed, with explanation displayed above, not altering the return typenot in jupyter:
plot(...,explain=False)
-> Return fig object, that can be displayed if the user chooses fig.show()
themselvesplot(...,explain=True)
-> Return fig object, that can be displayed if the user chooses fig.show()
themselves. Print the explanation to the terminal (this is new, and I think this would make .plot
more consistentIf the above is ensured, we just have to find a way to deliver all output to the user in an alternative way (which as said in 1 should definitely be a dataclass
. I think there is broadly three options:
.plot(...,return_components = True)
Yes, I think that would be wise, but I felt it was not the major focus for now
I have tried to show above that plot
is not a subset of get_plot_outputs
, but a sister function. That would suggest (I think) a variable return type. But it makes sense as you say to look into the future:
.plot
plot
interacts with other parts of VizroAI
, will it be a burden to sync with a potential different function (thinking of iteration, usage in .dashboard
, etc)Long story short - we simply cannot tell. Thus I prefer to have a single function with a variable return type that is clear to maintain: it returns either the fig with all other components shown alongside OR it returns a dataclass with all the components as attributes.
That is also easy to explain in docs, and easy to use. If plot
becomes involved in other parts of the program, at least we only have to think about plot
and not something else.
Great points @maxschulz-COL! You are absolutely right that it's good to automatically show the figure in a Jupyter notebook, which I hadn't thought of before. That would actually still be possible with my "ideal" solution above since you could delegate the relevant methods to the underlying figure:
@dataclass
class Plot:
# I'm just taking the same property names as you use now, even if they could maybe be improved a bit
business_insights: str
code_explanation: str
code_string: str
fig: go.Figure
# Whatever methods are needed
def _repr_html_(...):
return fig._repr_html_(...)
This is very similar to one of the prototypes I had for our _DashboardReadyFigure
which had similar requirements of needing to behave like a go.Figure
in Jupyter while requiring extra behaviour for use in a dashboard. There I went for inheritance over composition in the end. One of the reasons for this is it turned out that while it's possible to do the above it's actually a bit painful to implement. And given we have the constraint of needing to return a go.Figure
already given the change in vizro-ai 0.2.0, let's not do this.
We could actually also follow the subclassing approach like we do with _DashboardReadyFigure
and do class Plot(go.Figure)
and put a business_insights
property into that class to get the best of both worlds where you get something that behaves like go.Figure
but also exposes additional properties. This is pretty ugly and also a little awkward to implement so I don't think I'm a fan of it but might work ok here 🤔
Assuming we don't like that then I agree with your solution 1 here that we have a single plot
function but with an argument that alters the return type. But I find the existence of both explain
and return_components
a bit confusing and wonder if we could just use the explain
argument for both?
plot(..., explain=False)
: return fig object and don't show any explanation, just like it does now.plot(..., explain=True)
: return dataclass
(different from now). If in Jupyter then display explanation on screen (as done now) and also fig on screen (needed since we no longer directly return the figure object). If outside Jupyter then personally I don't care whether we show the explanation or not, since it's available in the return dataclass anyway.My assumption here is that in 99% of cases explain
is basically equivalent to return_components
. The only cases that this solution renders impossible would be:
explain=True
and return_components=False
: you want to see the explanation but don't want to use it further. With my above solution you set explain=True
and get extra components in your return object and you can just ignore those if you like so this seems fine since the graph is still displayedexplain=False
and return_components=True
: again just set explain=True
. The only difference here is the explanation gets shown (at least in Jupyter notebooks) when maybe you don't want that, but that seems ok to me also.Assuming we don't like that then I agree with your solution 1 here that we have a single
plot
function but with an argument that alters the return type. But I find the existence of bothexplain
andreturn_components
a bit confusing and wonder if we could just use theexplain
argument for both?
plot(..., explain=False)
: return fig object and don't show any explanation, just like it does now.plot(..., explain=True)
: returndataclass
(different from now). If in Jupyter then display explanation on screen (as done now) and also fig on screen (needed since we no longer directly return the figure object). If outside Jupyter then personally I don't care whether we show the explanation or not, since it's available in the return dataclass anyway.
Happy with this! Would this not be another breaking change, because now the return type of the plot function with explain=True
is actually go.Figure
and it would change to dataclass
?
Haha, I knew you would ask that 😀 And it's definitely the right question to ask.
Technically yes, it's a breaking change. But breaking changes are always a bit of a grey area, relative to how things are actually used and not clear cut. Within the context of how people use VizroAI, would anyone be doing explain=True
and using the go.Figure
directly? My feeling is probably no: if you want to use the figure then you would do explain=False
(the default, so undoubtedly the most common anyway). The result for Jupyter users would look the same because we'd still show the plot in the notebook. The problem would only come if you're actually consuming that plot in some way with explain=True
.
Given that we only just did the breaking release 0.2.0 and changed the return type there I would be inclined to do this more as a "new feature" rather than breaking change, but it's definitely debatable. I don't mind really - up to you guys to think about how many people this actually would break the workflow for.
Given that we only just did the breaking release 0.2.0 and changed the return type there I would be inclined to do this more as a "new feature" rather than breaking change, but it's definitely debatable. I don't mind really - up to you guys to think about how many people this actually would break the workflow for.
I am ok with it! Let's see what @Anna-Xiong thinks
Thanks for the thorough reviews and suggestions on enhancing the plot function of VizroAI. After considering all the inputs, I have updated the code accordingly and have a few point would like to contribute here:
Agreement on Single plot Call:
I agree with using a single plot
method that adjusts return type based on the provided arguments. The reason is the same as our first POC: given that even if we have two calls, there would be a duplicate of almost all arguments.
Support for Dataclass: Using a dataclass to return structured data is a great idea and easier to use than dictionary for sure. updated that already.
Concerns with only using explain:
However, I have reservations about using only explain
parameter to control all aspects of the output.
figure object
and code string
to integrate into chat application without the explanation or vice versa. In fact, that is one of the most common requirement we have seen and this is also what our chat application uses as well. explain
and return_plot_components
as below:
explain
only controls whether to run extra LLM to get text insights of business and code explanationreturn_plot_components
controls whether to Plot dataclass
or return only plotly fig object
I would be strongly leaning towards additional flags here that control the type of the output without merging them into a single parameter explain
.
If we are not ok for this adding extra argument change, I would think it is better to have a separate get_plot_components()
method that
VizroAI.get_plot_components()
could return the dataclass
so that we don't mix what explain
controls, VizroAI.plot()
a short cut , it can return the fig
from dataclass
returned from VizroAI.get_plot_components()
Decision on Pylint Rule Adjustment:
Lastly, as we expand our this VizroAI.plot()
function, the additional argument brings our total to six, surpassing the current pylint rule of five arguments per function. We need to decide if we should adjust our linting rules to accommodate this change.
Code Updates: I've made some updates to the existing prototype to reflect our discussions so far.
We care more about auto rendering in jupyter environment, so if users are using jupyter, the utils has been refactoring to take care of auto rendering of fig. No matter what users specify in plot()
call, the fig is always rendered in jupyter, and not rendered automatically in other environment, which they can add one line of code to show the fig(exact same as our current plot()
call, no breaking change).
plot(...,explain=False, return_plot_components= True)
-> Return dataclass Plot that consists of code_string
and fig
, the fig is automatically displayed if in jupyterplot(...,explain=False, return_plot_components= False)
-> Return fig
directly, the fig is automatically displayed if in jupyter.plot(...,explain=True, return_plot_components= True)
-> Return dataclass Plot
that consists of code_string
and fig
, business_insights
, and code_explanation
. The fig is automatically displayed and the Markdown is rendered for text information if in jupyterplot(...,explain=True, return_plot_components= False)
-> Return fig
directly, The fig
is automatically displayed and the Markdown is rendered for text information if in jupyterAlso return_plot_components
will be default to False, so we don't cause any breaking changes from the users' side.
Let's take a look at the code and share your thoughts on these points, especially regarding the flexibility of the output controls and the pylint rule adjustment.
Concerns with only using explain: However, I have reservations about using only
explain
parameter to control all aspects of the output.
- Our users might have requirements where they need the
figure object
andcode string
to integrate into chat application without the explanation or vice versa. In fact, that is one of the most common requirement we have seen and this is also what our chat application uses as well.- This also involves calling extra layer of LLM or not(cost and speed relevant). In short, I would summarize the responsibility of
explain
andreturn_plot_components
as below:
explain
only controls whether to run extra LLM to get text insights of business and code explanationreturn_plot_components
controls whether toPlot dataclass
or return onlyplotly fig object
I would be strongly leaning towards additional flags here that control the type of the output without merging them into a single parameter
explain
.
This is a very valid point, and I had not thought of that. I think this makes a strong case for actually adding an extra parameter. It also avoids the "breaking change".
If we are not ok for this adding extra argument change, I would think it is better to have a separate
get_plot_components()
method thatDecision on Pylint Rule Adjustment: Lastly, as we expand our this
VizroAI.plot()
function, the additional argument brings our total to six, surpassing the current pylint rule of five arguments per function. We need to decide if we should adjust our linting rules to accommodate this change.
I think it is ok to add an ignore here - I think that pylint rule is generally good to have, but can feel too strict at times.
Code Updates: I've made some updates to the existing prototype to reflect our discussions so far.
We care more about auto rendering in jupyter environment, so if users are using jupyter, the utils has been refactoring to take care of auto rendering of fig. No matter what users specify in
plot()
call, the fig is always rendered in jupyter, and not rendered automatically in other environment, which they can add one line of code to show the fig(exact same as our currentplot()
call, no breaking change).
plot(...,explain=False, return_plot_components= True)
-> Return dataclass Plot that consists ofcode_string
andfig
, the fig is automatically displayed if in jupyterplot(...,explain=False, return_plot_components= False)
-> Returnfig
directly, the fig is automatically displayed if in jupyter.plot(...,explain=True, return_plot_components= True)
-> Return dataclassPlot
that consists ofcode_string
andfig
,business_insights
, andcode_explanation
. The fig is automatically displayed and the Markdown is rendered for text information if in jupyterplot(...,explain=True, return_plot_components= False)
-> Returnfig
directly, Thefig
is automatically displayed and the Markdown is rendered for text information if in jupyter
Didn't have time to check the details - is the mechanism that render the fig
in the cases where return_plot_components= False
the original Jupyter one? Just curious on the details.
Also
return_plot_components
will be default to False, so we don't cause any breaking changes from the users' side.
Great!
Let's take a look at the code and share your thoughts on these points, especially regarding the flexibility of the output controls and the pylint rule adjustment.
Overall I think we should do it as you say, and you are right, it is one of the most requested features, and it separates responsibility of arguments.
@antonymilne @maxschulz-COL
Thank you for reviewing it again!
Didn't have time to check the details - is the mechanism that render the fig in the cases where return_plot_components= False the original Jupyter one? Just curious on the details.
Then most questions remaining here would be relevant to naming of arguments and dataclass.
return_elements
instead of return_details
would be better for the argument in plot()
call? @maxschulz-COL do you have any other suggestion on names?Just a final suggestion on the names: return_elements
is ok but I think I'd prefer return_full
overall if that makes sense for you two also? @maxschulz-COL @Anna-Xiong
Just a final suggestion on the names:
return_elements
is ok but I think I'd preferreturn_full
overall if that makes sense for you two also? @maxschulz-COL @Anna-Xiong
I am ok either way, but slightly towards return_elements
or return_full_elements
? Let's see what @maxschulz-COL thinks?
Just a final suggestion on the names:
return_elements
is ok but I think I'd preferreturn_full
overall if that makes sense for you two also? @maxschulz-COL @Anna-XiongI am ok either way, but slightly towards
return_elements
orreturn_full_elements
? Let's see what @maxschulz-COL thinks?
I prefer return_elements
- I find that is really what it is.
Fine by me, let's go for return_elements
then 🙂
Description
we will have a
dataclass
forPlot
components.here is our updated
plot()
call.explain
controls whether to run extra LLM to get text insights of business and code explanationreturn_plot_components
controls whether to Plot dataclass or return only plotly fig objectPending:
will be deleted afterwards.
[x] I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":