Closed jmread closed 6 years ago
Hi @jmread, I intend to implement this capability and also the memory usage accounting soon.
@jacobmontiel and @jmread could you give me some guides or suggestions of how I can make those changes maintaining the structure and standards of the framework?
Would adding new classes to the measure collection be the best option? In this case, these metrics would be common for all the prediction tasks. This could change the way how the framework makes the distinction among the prediction tasks unless the new metrics are added to all tasks (in skmultiflow/utils/constants.py
). For example, when computing the following set operations:
Hi @smastelini
As you mention, the running time should apply to all evaluation tasks. On the other hand, I don't think we need a new class in the measure collection. Currently, we are measuring the time taken to the evaluate n models but it is not incrementally tracked. This raises some interesting questions/concerns:
Now, measuring the memory usage is going to be a more challenging task as pointed out in #56. Especially if we want to extend such capability to all models.
Hello again @jacobmontiel
When you mention evaluating n
models, are you referring to ensemble predictors or the comparison of different techniques?
In the case of ensembles, I think that might be interesting to report the total running time, excluding individual predictors, at least initially.
Additionally, reporting the training and testing time separately (in addition to the total running time) could be very useful too.
I meant n models being compared (either simple models or ensembles). Now, if we consider multiple time points to track then it may be worth to have a class or method for this specific purpose.
Hi @jacobmontiel,
I agree with you. If the running time and memory costs are going to be reported at the same stream "points" (or at the same frequency) as other metrics, e.g., accuracy, then it may be worth creating classes like *TimeAndMemoryMeasurements
for these tasks.
In fact, the major reason I see to do this, would be writing the measured values into the output log and plotting the measured results.
I prefer to have separate trackers for memory
and time
. I think time
has to be tracked by default (as currently is) and could be added to the plots in the form o a legend (text). On the other hand, memory
could be optionally selected and plotted.
You are right. Computing time will add few extra processing efforts in the framework. On the other hand, calculating the total memory used by the models is costly when done at possibly small time intervals. My bad!
Following your recommendations, would the time measurements be reported for each model (considering multiple time intervals) without needing to specify this metric in the Evaluator?
Thanks again for the fast responses @jacobmontiel!
@smastelini , I am currently working on a refactoring of the evaluator and visualizer which will impact this issue.
We will add a buffer object to share data between evaluators and visualizer(s). Additionally, this should help reduce the amount of code by providing a generic way to share data. I expect to have the change ready in the next couple of days.
Very nice @jacobmontiel! Currently I have the time and memory measurements under "alpha" tests. I plan to integrate these new features soon as a PR.
I will wait for the mentioned refactoring and integrate my changes, as well as, prepare tests and documentation for the new features. After that I will proceed with the PR.
Hello @smastelini
The new buffer structure is in place. You will see changes in the evaluators and visualizer (for the good I hope) which could be confusing at first. Let me know if you have questions.
Hi @jacobmontiel,
I've implemented the running time and memory accounting methods (considering the points discussed in #56) and their corresponding tests. Currently, I'm "struggling" to integrate those new metrics in the plots.
The running time is divided in training, testing, and total running. Memory is represented by a single value per model. In think those metrics do not fit the (current, mean) paradigm, as the other measures. Hence, they should be reported differently.
My question is about how handling the _update_annotations
method and possibly multiple legends. Could you provide me some guidelines of how to implement those changes while maintaining the original intended structure of evaluation_visualizer.py?
Thanks in advance for your help!
Hello @smastelini ,
I agree that time and memory are special cases with respect to default current/mean measurements:
I have been thinking about this and considering the current layout I have this proposal:
| Figure Title |
|-----------------------------|
|---------------|-------------|
| Subplot 1 | Metrics 1 |
|---------------|-------------|
| Subplot n | Metrics n |
|---------------|-------------|
|-----------------------------|
| Time: Train Test Total |
| Memory:M1 M2 Mm |
| Figure Legend |
This would only apply to the default visualizer, as new visualizer are added they should provide their own layout.
My only concern is if the user sets the evaluator with multiple plots and multiple models (+3) which could result in visual clutter. But for a normal" setup should work. Comments?
As a comment, the new data buffer is intended to be generic and support different types of data. For example, you can check the true_vs_predicted
and data_points
portions that are cases that do not follow the current/mean paradigm.
Let me know what you think.
Hi @jacobmontiel,
I think that your solution might be the best one for generality. As the time and memory measurements are reported in the csv
output, I agree that just showing the current measurements as text in the plots should be enough. Detailed analysis could be performed using the generated log.
I will adapt the visualizer to show textual information of those metrics following the proposed layout. As soon as those changes were done, I'll let you know!
Thanks for the suggestions.
The running time should be included as an evaluation metric. It is already being logged for the overall process, but it would be nice to get a plot over time. It could also be interesting to split into training time / testing time.