probabl-ai / skore

Skore lets you "Own Your Data Science." It provides a user-friendly interface to track and visualize your modeling results, and perform evaluation of your machine learning models with scikit-learn.
https://probabl-ai.github.io/skore/
MIT License
70 stars 7 forks source link

Enhance primitive object display in the frontend #393

Open rouk1 opened 1 month ago

rouk1 commented 1 month ago

For now we use basic JSON serialization and code highlight. Log lists / dict should be trimmed to avoid painful scrolling.

          > Could we store a plotted representation of the series ? The markdow layout will be bad especially for large series. [Pandas offers this](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.html#pandas-series-plot) with a default matplotlib backend (which is supported by the frontend).

I think its hard to find a generic plot which can represent all sort of series, while remaining easy to understand. Maybe we can just display a sample of the series when its too long ? As done by pandas in python (0, 1, 2, [...], 3, 4).

@rouk1 is there the same problem on list-primitive type? @MarieS-WiMLDS @sylvaincom what do you think?

Keep in mind that, as for python list, the user can put whatever he wants in his series (str, number, str & number etc).

Originally posted by @thomass-dev in https://github.com/probabl-ai/skore/issues/378#issuecomment-2371250928

thomass-dev commented 1 month ago

Do you need the backend to identify list-like object in the API?

tuscland commented 1 month ago

We should not try to give a representation to objects that don't have one. It is a choice of the user to represent things, and MediaItem is made to work around this need.

In the future, we might want to offer the ability to create a visualization from data saved in the project.

tuscland commented 1 month ago

In the future, we should be able to create plots and charts from data stored in skore. For now, we need to pick a sensible default.

@MarieS-WiMLDS suggests to limits the amount of series items. @sylvaincom suggests to use the default plotting function of pandas, because it is convenient.

I would suggest to use Sylvain's suggestion for a start.

sylvaincom commented 1 month ago

For more information:

numpy

numpy.set_printoptions

Example:

import numpy as np
np.set_printoptions(threshold=5)
np.arange(10)

returns array([0, 1, 2, ..., 7, 8, 9])

pandas

Options and settings

Example:

import pandas as pd
pd.set_option("display.max_rows", 999)
pd.set_option("display.max_columns", 999)
pd.set_option("max_colwidth", 40)
thomass-dev commented 1 month ago

So what is proposed here is to call repr on Numpy/Pandas object (and so getting a string representation) before sending it to the frontend?

tuscland commented 1 month ago

It is just an inspiration to design our own display options. How it should be implemented is another matter.

tuscland commented 1 month ago

Let's focus on #394 before this one.

tuscland commented 1 month ago

Solution suggestion @anasstarfaoui @rouk1

Primitive list:

Series:

Primitive dictionary: