probabl-ai / skore

Skore lets you "Own Your Data Science." It provides a user-friendly interface to track and visualize your modeling results, and perform evaluation of your machine learning models with scikit-learn.
https://probabl-ai.github.io/skore/
MIT License
70 stars 7 forks source link

Numerical precision in skore-ui #394

Closed tuscland closed 1 month ago

tuscland commented 1 month ago

Numbers in DataFrame tables have too many digits. This results in a poor presentation.

I would not recommend to rely on the browser locale to format things. It would make things difficult to compare for teams that work in international environments (some will have a comma decimal separator, and other will have a dot). The tool we are building is suited to data-savvy users.

So, we can offer a parameter to specify number formatting as project-level settings. The default format should allow for a precision of 3 digits.

Here is a suggested approach for formatting numbers:

const formatter = new Intl.NumberFormat('en-US', {
  minimumFractionDigits: 0,
  maximumFractionDigits: 3
});
function formatNumber(number) {
  return formatter.format(number);
}

Which would result in 3 project-level settings:

ui.number-format.locale = 'en-US'
ui.number-format.minimum-fraction-digits = 0
ui.number-format.maximum-fraction-digits = 3

I'm open to improved or different ideas.

rouk1 commented 1 month ago

I suppose this should be coupled with a tooltip to see the value with its full precision ?

tuscland commented 1 month ago

Sounds like a great idea 👍 @anasstarfaoui wdyt?

anasstarfaoui commented 1 month ago

Hey, everything sounds super nice! I just have one concern: what about that specific case where the user wants 3 digits for an element, and 2 for the other? I feel like making it global could be potentially restrictive and will not follow the pattern of flexibility that we tried to initiate in other parts of skore :)

What about maybe having preferences per element? So it would be fully modular.

I worked at the time on a solution who tackled this pain point: image

tuscland commented 1 month ago

@MarieS-WiMLDS can you tell us how it works in popular tools? For example, when you display a DataFrame in Jupyter, what is the default behavior and can it be customized?

sylvaincom commented 1 month ago

In numpy, you can use numpy.set_printoptions:

import numpy as np
np.set_printoptions(precision=4)
np.array([1.123456789])

which returns: [1.1235] These are configuration parameters for all dataframes, but you can individually round a single array (and not the others):

np.round(my_array, 2)

Note that this function can also summarize long arrays (related to https://github.com/probabl-ai/skore/issues/393):

np.set_printoptions(threshold=5)
np.arange(10)

which returns: array([0, 1, 2, ..., 7, 8, 9])

You have the equivalent in pandas, see Options and settings, for example:

import pandas as pd
pd.set_option("display.max_rows", 999)
pd.set_option("display.precision", 5)

These are configuration parameters for all dataframes, but you can individually round a single dataframe (and not the others):

my_df.round(2)
thomass-dev commented 1 month ago

I think it should be a parameter in the UX before thinking of a programmatically way.

tuscland commented 1 month ago

@rouk1 there is an interesting discussion on mlflow, this comment.

Basically:

So I believe this issue should be more specific. Closing until we have more information.