Closed NeroOkwa closed 1 year ago
For this issue, it's worth noting that we do have this functionality already.
It's just that:
Note that plotting metrics against parameters and/or kedro runs is a big topic which has been considered by many different tools and also discussed by us before: https://github.com/quantumblacklabs/private-kedro/issues/1192 https://github.com/kedro-org/kedro/issues/1070 (copy of above issue to public repo but missing some posts)
Just don't want previous discussions or existing solutions from other products to be forgotten about here 🙂
We should be careful with our assumptions here. Some notes about that:
Bottom line is that the data aren't always going to be nice, not always between 0 and 1, or play nice together if a user is tracking multiple metrics on one plot.
Do reference @AntonyMilneQB's comment here for more context.
Let's pick @noklam's brain about this, too. He may have some great real-world experience with some other tools in this space that do similar things.
As I understand we are discussing comparison plots across runs here.
As is right now, the X-axis is timestamp, and that's impractical. Should have a way for it to be uniform so you don't have clusters of runs and then huge gaps in time.
This feature are almost available for most experiment tracking tool, but this is usually for a X-axis within the same run, but I think it's mostly valid for cross-runs as well.
See similar things on Weight & Biases, which is really flexible and you can configure
Y-axis doesn't need to only be between 0 and 1. It can be arbitrarily high or low and it's very possible you'd want to plot multiple metrics on the same scale, one with a huge scale range and then another one that's very small. You could normalize the scales or use a parallel coordinates plot.
I think it all makes sense, but some of the features would be difficult to implement, and the live plot is mentioned in this issue. The more raw data you keep, the more flexible you can customize these plots later. Another limiting factor for the live plot is we only save output at the end of a node execution. We need to keep data at a more granular level to support live plots and these chart customizations. It will be a huge change on the backend though and doesn't feel quite well with the node execution paradigm.
Side note:
AFAIK W&B is also running with GraphQL API with vega(or vega-lite), which is based on d3.js. In python there is altair
which support vega-lite
. This crazy example shows how customizable it could be, though it's not a common use case.
Just to clarify, I don't think live plotting of metric vs. epoch is in scope here at all (as @noklam says, we can't do anything like that without a lot more work on kedro core and it would be quite a paradigm shift). For now we're just concerned with comparing metrics saved as a dataset (so from a node output) in one kedro run
vs. the same dataset(s) in another kedro run
. What does work "live" here is that when you do kedro run
, the newest datasets are available in Kedro-Viz straight away without refreshing or needing to restart the server thanks to the GraphQL subscription.
Hey everyone! I won't be in the Experiment Tracking review session tomorrow and I just have some thoughts on the current prototype design.
So from what I understand the original problem we're supposed to be solving is: "I'm choosing not to use Kedro-Viz Experiment Tracking because it doesn't allow me to visualise metrics over time."
I may be wrong but I assumed it would as simple as saying, _"I've done 20 pipeline runs, I was tracking mean_absolute_percentage_error
and I want to see how my mean_absolute_percentage_error
changed over time by looking at a plot of the values against time on a chart."_ Is this view correct or incorrect?
The reason I ask this is because:
So at the end of the day, the question becomes which problem are we solving for our users to increase adoption of Kedro-Viz Experiment Tracking? Are our users choosing not to use Kedro-Viz Experiment Tracking because:
I'm inclined to think it's the first problem but I'm also happy to be proven wrong on this. So keeping in mind that I'm also making assumptions throughout this piece, I would propose the following structure for user testing, which would provide more insights into the impact of not delivering on either of those problem statements:
tl;dr: speaking as an ex-PAI user I think parallel coordinates plot is better than time series plot. They both solve very similar problems, but parallel coordinates plot seems more powerful. I don't see why we shouldn't support both but would definitely prioritise the parallel coordinates plot.
I may be wrong but I assumed it would as simple as saying, _"I've done 20 pipeline runs, I was tracking
mean_absolute_percentage_error
and I want to see how mymean_absolute_percentage_error
changed over time by looking at a plot of the values against time on a chart."_ Is this view correct or incorrect?
I think this is both correct and incorrect 😀 Basically I don't think the two different problems you're posing are all that different. At the end of the day, they both boil down to: for each (kedro run, metric name) point there is a metric value. How do I compare metric value across many (kedro run, metric name) points? Saying "I want to track a metric over time" doesn't necessarily mean "I want a plot of metric vs. time". The parallel coordinates plot still has the ability to compare between even if there's not a time axis.
See "How to visualise metrics dataset" in https://github.com/kedro-org/kedro/issues/1070#issuecomment-979132359 for my full comments. A time series plot of one metric is one thing you might want to look at, but in reality such a plot is very limited:
So, in theory it's possible to do the full metric value vs. (kedro run, metric name) comparison on a time series plot, but it's not ideal and certainly the way PAI did it was not good enough for what we're trying to do here.
The parallel coordinates plot is not so different from the above, it's just a different way of showing metric value vs. (kedro run, metric name) . However, it seems to be generally more suitable than the time series plot since it doesn't suffer so much from the above problems. There's still some things we need to be careful of to make sure it works:
Crucially the first 2 of these were possible but the 3rd was missing in PAI (but should come naturally in kedro-viz because we already have the ability to choose which runs you're comparing). The other main problem with the PAI plot is that it's radial rather than parallel, which looks cool but is harder to use in reality.
Overall, I think there's a good reason that tools like neptune and wandb do parallel coordinates plots. It seems like the best way to compare metric value across many (kedro run, metric name). That's not to say that we shouldn't have a time series plot as well, but I think most people would end up using the parallel coordinates plot way more.
One final thought while it occurs to me: you can actually sort of retain the the time ordering in the parallel coordinates plot if you colour the lines somehow, e.g. to show the oldest ones fainter than the most recent ones. Not super important because I don't think the time ordering is that important, but at least highlighting the most recent run might be nice.
As I said in the meeting yesterday, my intuition and instinct around what a user may want for new features here isn't sharp. I defer to @AntonyMilneQB, @noklam, and others who have used things like this in the past while doing real DS/DE/ML work.
What I do think we need is consistency with our hierarchy of information and a viable amount of added value with whatever we develop next. A few things stood out to me during the meeting yesterday:
Parallel Coordinates
plot should take precedence over a Time-series
one.I'm excited to hear what our interviewees say when this is shown to them.
Lastly, calling out @noklam here. Please add some thoughts and comments if you have some. I think they're invaluable here!
While browsing the original issue I came across this from @mkretsch327 (ex-QB data scientist). Basically I think DS (me, Nok, Matt) like the parallel coordinates plot 👍
For a metrics-over-runs view, I've found a parallel coordinate-like plot(essentially a flattened version of the circular metrics plot from PerformanceAI) to be super-useful. A majority of the time I'm looking to see what runs resulted in metrics that are at the extremes of a range (high or low), and that chart ends up providing that information concisely, even for relatively large numbers of metrics.
I'm happy for this. I will say that we will prioritise one view to solve the original user problem that was raised. At this point it's either parallel coordinates or time-series, it won't be both because we have other problems to solve once this is completed. And I want to feel certain that if we acted on https://github.com/kedro-org/kedro-viz/issues/1000 that we would be doing the right thing.
Admittedly, I am a bit nervous about the parallel plot because we had feedback about the spider diagram when we were evaluating PAI. I highlighted the relevant insight in dark pink.
Let's see what users say.
I see how the spider diagram might be confusing for some (even though is the same thing as parallel coordinates). It might look cool for some but the fact it was circular added too much in [visual] complexity and more difficult readability. This is not an issue related to this specific graphic but a universal visual design fact. When flattening "the same" into a horizontal alignment it becomes much more digestible.
I understand picking one or the other for now for the sake of practicality and moving forward iteratively, but I would not ignore one or the other since they are different ways of exploring the data from different angles.
Again, let's ask the right questions and listen to what users say over the sessions. Loads of great insights are coming.
The goal of this session was to evaluate the usability and value risk of the proposed feature on #1627(tracking metrics over time) through a low-fidelity mockup and a high-fidelity prototype.
The research used a qualitative (interview 🎤 - 6 participants) and quantitative (polls 🗳️) approach across the QuantumBlack and open-source user bases.
Summary: 2/6 users currently use kedro experiment tracking feature. Experiment tracking was used by users to understand their experiments and to find the best one by iterating with different parameter, to produce different metrics. This was done using MLflow, Weights & Biases, and Tableau
Summary: 3/6 users know of this feature and have used it to plot their metrics. One user mentioned that its location is non-intuitive and difficult to find for non-users
Summary: 3/6 users start with a clear metric to track defined by the project, while others don’t and are more exploratory.
Summary: All 6 users prefer this new tab design
Summary: 2 users each like time series and parallel coordinate plots, and 2 users like and would use both plots for different use cases.
Summary: 4/6 users preferred comparison mode in parallel coordinate mode compared to time series. 1 user found comparison mode and the ‘metrics’ tab confusing.
Summary: The most common pain point identified by 4/6 users was the axis, or the ability to change the scales or customize the values to be in percentages for easy comparison.
Summary: There were general feature requests and those specific to the plots. The most common general features identified by 3/6 users was Filtering, followed by the ability to change the axis or Customize the metric values.
I'll close this 🥳 This theme is complete.
Description
Ability to plot experiment metrics derived from pipeline runs.
This is based on the second high priority issue resulting from the experiment tracking user research, which is:
Visualisation: Ability to show plot /comparison graphs/hyper parameters to evaluate metrics tradeoff
What is the problem?
Who are the users of this functionality?
Why do our users currently have this problem?
What is the impact of solving this problem?
What could we possibly do?