probabl-ai / skore

Skore lets you "Own Your Data Science." It provides a user-friendly interface to track and visualize your modeling results, and perform evaluation of your machine learning models with scikit-learn.
https://probabl.ai
MIT License
45 stars 0 forks source link

Storing large data frames #330

Open sylvaincom opened 1 month ago

sylvaincom commented 1 month ago

Issue

From the notebook, suppose that I load a large dataframe of shape (1M, 1k) let's say. I try to store it: it takes a lot of time.

Expectation

I would like a warning to the user that his file is too big, or maybe just store a subset of the big Dataframe (and warning that the display of the dashboard is truncated). Indeed, no one needs to visualize a dataframe of shape (1M, 1k) in the dashboard.

tuscland commented 4 weeks ago

Thank you for this interesting feedback.

For a warning, I am not sure what the right behavior would be.

So I suggest picking sane defaults:

Then an enhancement that must examined in a wider context:

tuscland commented 2 weeks ago

As suggested by @rouk1, we could think about using Perspective as a visualization tool unlocking many interesting use-cases, including the ability to store large amounts of data in a skore database..

tuscland commented 2 days ago

Discussed with @fcharras that suggested to have a look at DeltaLake, and parquet files.