pickle renders the student and glovebox setups tightly coupled

gmanchon commented 2 years ago

we currently use the pickle format, which is language specific and package version sensitive, in order to communicate challenge results between the student repo and glovebox

transmitting a serialised memory object over the network contraints our ability to process it : students need to match precisely the glovebox setup (python + package versions), and we have yet to find a way of preventing them from upgrading or downgrading their packages through pip install during the bootcamp

a solution to loosen the coupling could be to follow the path of the web and add support for standard data types such as json for the exchange between our apps

ssaunier commented 2 years ago

Hello @gmanchon 👋

Do you have a specific example / a repo reproducing a problem at hand? Looking at the code, we are dumping an instance of ChallengeResult which should store simple attributes. If attributes are only Python primitive types, then how could package versions interfere?

Maybe my reasoning is missing something, that's why a specific challenges where this problem actually arises will help me understand this a bit better. One resolution could be to update the test challenge result not to use classes from packages but indirection with Python primitive types (integers, floats, lists, dict, etc.)

Thanks!

gmanchon commented 2 years ago

Hello @ssaunier 👋

I was able to reproduce the issue with pandas 1.3.5 on student setup and pandas 1.2 on glovebox setup.

The challenge on which the issue was encountered stores a dataframe in the results.

This is not a practical issue as long as we stick to simple data types because pickle is unlikely to change its serialisation format for those. Yet if we stick to simple data types a more interoperable data format might be an option.

The setup is reproduced in this repo (no LW data inside).

Which raises this error :

➜  nbresult_test git:(master) pytest tests/test_data.py                                     [🐍 nbr_glovebox 🍓 3.0.3]
================================================= test session starts =================================================
platform darwin -- Python 3.8.12, pytest-7.0.0, pluggy-1.0.0
rootdir: /Users/gmanchon/code/sandbox/nbresult_test
collected 1 item                                                                                                      

tests/test_data.py F                                                                                            [100%]

====================================================== FAILURES =======================================================
___________________________________________ TestData.test_data_is_above_82 ____________________________________________

self = <tests.test_data.TestData testMethod=test_data_is_above_82>

    def setUp(self):
        """Load the pickle file"""
        klass = self.__class__.__name__
        name = re.sub(r'(?<!^)(?=[A-Z])', '_', klass).lower()[len('test_'):]
        result_file = os.path.join(os.getcwd(), "tests", f"{name}.pickle")
>       self.result = pickle.load(open(result_file, 'rb'))
E       AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks' from '/Users/gmanchon/.pyenv/versions/3.8.12/envs/nbr_glovebox/lib/python3.8/site-packages/pandas/core/internals/blocks.py'>

../../../.pyenv/versions/3.8.12/envs/nbr_glovebox/lib/python3.8/site-packages/nbresult/__init__.py:52: AttributeError
=============================================== short test summary info ===============================================
FAILED tests/test_data.py::TestData::test_data_is_above_82 - AttributeError: Can't get attribute 'new_block' on <mod...
================================================== 1 failed in 0.28s ==================================================

ssaunier commented 2 years ago

Thanks for investigating 🙏 - What do you think, should we update the challenge's test to try and stick to simpler data?

gmanchon commented 2 years ago

Let's update the challenge test. We could also verify other challenges in the repo and/or add a GHA to validate the data types used when a new challenge is pushed... WDYT ?

ssaunier commented 2 years ago

Good idea 💡 ! Before the GHA maybe have a script to list challenges which are at risk, to allow time to fix them. Once they are all fixed => implement that script as a GHA to keep it green in the future!

gmanchon commented 2 years ago

script is pending, would not have thought to see that diversity in the content

➜  data-solutions git:(master) nbr run .                                                                                                                                                                                 [🐍 lewagon 🍓 3.0.3]
run check

Pickles containing:
- str: 155
- ndarray: 27
- dtype: 2
- tuple: 39
- int64: 18
- set: 1
- int: 31
- float64: 51
- Index: 4
- dict: 7
- list: 13
- Series: 2
- float: 23
- LinearRegression: 1
- DataFrame: 10
- bool_: 1
- RandomizedSearchCV: 1

Pickles containing str:
- 02-Data-Toolkit/01-Data-Analysis/01-Notebook/tests/import_hello.pickle: (name, sentence)
- 02-Data-Toolkit/01-Data-Analysis/02-Numpy/tests/numpy.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/date.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/filtered_gas.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/full_gas.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/merged_dataframes.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/oil.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/all_df.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/inner_merge.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/left_merge.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games_combined.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games_event.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games_season.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/outer_merge.pickle: (name)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/right_merge.pickle: (name)
- 02-Data-Toolkit/02-Data-Sourcing/00-Warmup/tests/warmup.pickle: (name)
- 02-Data-Toolkit/02-Data-Sourcing/01-Stock-Market-API/tests/apple.pickle: (name, index_name)
- 02-Data-Toolkit/02-Data-Sourcing/02-Scraping/tests/books.pickle: (name, title)
- 02-Data-Toolkit/02-Data-Sourcing/03-Text-Extraction-with-Regex/tests/patterns.pickle: (name, zipcode_re, date_re, quantity_re, amount_re, quantity_grp_re, amount_grp_re)
- 02-Data-Toolkit/02-Data-Sourcing/03-Text-Extraction-with-Regex/tests/receipts.pickle: (name)
- 02-Data-Toolkit/03-Data-Visualization/01-Matplotlib-Intro/tests/quiz.pickle: (name, answer1, answer2)
- 03-Maths/01-Algebra-Calculus/01-real-estate-estimator/tests/features.pickle: (name)
- 03-Maths/01-Algebra-Calculus/01-real-estate-estimator/tests/solution.pickle: (name)
- 03-Maths/01-Algebra-Calculus/01-real-estate-estimator/tests/target.pickle: (name)
- 03-Maths/01-Algebra-Calculus/02-real-estate-advanced-estimator/tests/flats.pickle: (name)
- 03-Maths/01-Algebra-Calculus/02-real-estate-advanced-estimator/tests/univariate.pickle: (name)
- 03-Maths/01-Algebra-Calculus/03-Scipy/tests/global_optimization.pickle: (name)
- 03-Maths/01-Algebra-Calculus/03-Scipy/tests/minimize2d.pickle: (name)
- 03-Maths/01-Algebra-Calculus/03-Scipy/tests/minimize_constraints.pickle: (name)
- 03-Maths/02-Statistics-Probabilities/01-Law-of-Large-Numbers/tests/expected_value_coins.pickle: (name)
- 03-Maths/02-Statistics-Probabilities/01-Law-of-Large-Numbers/tests/expected_value_dice.pickle: (name)
- 03-Maths/02-Statistics-Probabilities/02-Random-Variables/tests/factorial.pickle: (name)
- 03-Maths/02-Statistics-Probabilities/02-Toss-a-Coin/tests/factorial.pickle: (name)
- 03-Maths/02-Statistics-Probabilities/03-Central-Limit-Theorem/tests/distribution.pickle: (name, skewness)
- 03-Maths/02-Statistics-Probabilities/03-Central-Limit-Theorem/tests/probability.pickle: (name)
- 03-Maths/02-Statistics-Probabilities/03-Central-Limit-Theorem/tests/zscore.pickle: (name)
- 04-Decision-Science/01-Project-Setup/02-Data-Preparation/tests/get_data.pickle: (name)
- 04-Decision-Science/01-Project-Setup/03-Exploratory-Analysis/tests/exploratory.pickle: (name)
- 04-Decision-Science/01-Project-Setup/Optional-Metric-Design/tests/orders.pickle: (name)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/distance.pickle: (name)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/get_wait_time.pickle: (name)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/number_products.pickle: (name)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/number_sellers.pickle: (name)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/price.pickle: (name)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/review_score.pickle: (name)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/training.pickle: (name)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/wait_time.pickle: (name)
- 04-Decision-Science/03-Linear-Regression/02-Sellers/tests/seller.pickle: (name)
- 04-Decision-Science/03-Linear-Regression/Optional-Products/tests/products.pickle: (name)
- 04-Decision-Science/04-Logistic-Regression/01-Logit/tests/logit.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/cv_results.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/prediction.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/trained_model.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/variables.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/02-Learning-Curves/tests/added_features.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/02-Learning-Curves/tests/cv_score.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/02-Learning-Curves/tests/prediction.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/02-Learning-Curves/tests/reduced_dataset.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/03-Problems/tests/problem_1.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/03-Problems/tests/problem_2.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/04-Logistic-Regression/tests/accuracy.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/04-Logistic-Regression/tests/predictions.pickle: (name)
- 05-ML/01-Fundamentals-of-Machine-Learning/04-Logistic-Regression/tests/reduced_accuracy.pickle: (name)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/duplicates.pickle: (name)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/encoding.pickle: (name)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/missing_values.pickle: (name)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/scaling.pickle: (name)
- 05-ML/02-Prepare-the-dataset/02-Feature-Selection/tests/base_model.pickle: (name)
- 05-ML/02-Prepare-the-dataset/02-Feature-Selection/tests/correlation.pickle: (name)
- 05-ML/02-Prepare-the-dataset/02-Feature-Selection/tests/feature_permutation.pickle: (name, feature)
- 05-ML/02-Prepare-the-dataset/02-Feature-Selection/tests/reduced_complexity_model.pickle: (name)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/base_model.pickle: (name)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/collinearity.pickle: (name)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/encoding.pickle: (name)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/missing_values.pickle: (name)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/scaling.pickle: (name)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/strong_model.pickle: (name)
- 05-ML/03-Performance-metrics/01-KNN/tests/best_model.pickle: (name, model)
- 05-ML/03-Performance-metrics/01-KNN/tests/default_score.pickle: (name)
- 05-ML/03-Performance-metrics/01-KNN/tests/optimal_k.pickle: (name)
- 05-ML/03-Performance-metrics/01-KNN/tests/price_error.pickle: (name)
- 05-ML/03-Performance-metrics/01-KNN/tests/scale_sensitivity.pickle: (name)
- 05-ML/03-Performance-metrics/02-Electrocardiograms/tests/best_model.pickle: (name, model)
- 05-ML/03-Performance-metrics/02-Electrocardiograms/tests/class_balance.pickle: (name)
- 05-ML/03-Performance-metrics/02-Electrocardiograms/tests/logistic_regression_evaluation.pickle: (name)
- 05-ML/03-Performance-metrics/02-Electrocardiograms/tests/precision.pickle: (name)
- 05-ML/03-Performance-metrics/02-Electrocardiograms/tests/prediction.pickle: (name, prediction)
- 05-ML/03-Performance-metrics/03-Threshold-Adjustments/tests/base_precision.pickle: (name)
- 05-ML/03-Performance-metrics/03-Threshold-Adjustments/tests/decision_threshold.pickle: (name)
- 05-ML/03-Performance-metrics/03-Threshold-Adjustments/tests/recommendation.pickle: (name, recommendation)
- 05-ML/03-Performance-metrics/03-Threshold-Adjustments/tests/scaled_features.pickle: (name)
- 05-ML/04-Under-the-hood/01-Loss-Functions/tests/loss_functions.pickle: (name)
- 05-ML/04-Under-the-hood/02-Solvers/tests/new_data_prediction.pickle: (name)
- 05-ML/04-Under-the-hood/02-Solvers/tests/solvers.pickle: (name, fastest_solver)
- 05-ML/04-Under-the-hood/03-Batch-Gradient-Descent/tests/descent.pickle: (name)
- 05-ML/05-Model-Tuning/01-Workflow/tests/knn.pickle: (name)
- 05-ML/05-Model-Tuning/01-Workflow/tests/r2.pickle: (name)
- 05-ML/05-Model-Tuning/02-Regularization/tests/lasso.pickle: (name)
- 05-ML/05-Model-Tuning/02-Regularization/tests/ridge.pickle: (name)
- 05-ML/05-Model-Tuning/02-Regularization/tests/unregularized.pickle: (name)
- 05-ML/05-Model-Tuning/03-SVM/tests/generalization.pickle: (name)
- 05-ML/05-Model-Tuning/03-SVM/tests/linear_svm.pickle: (name)
- 05-ML/05-Model-Tuning/03-SVM/tests/poly_svm.pickle: (name)
- 05-ML/05-Model-Tuning/03-SVM/tests/rbf_svm.pickle: (name)
- 05-ML/05-Model-Tuning/03-SVM/tests/sigmoid_svm.pickle: (name)
- 05-ML/05-Model-Tuning/03-SVM/tests/svm_rbf.pickle: (name)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/blobs.pickle: (name)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/color_count.pickle: (name)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/image_shape.pickle: (name)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/imageshape.pickle: (name)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/two_means.pickle: (name)
- 05-ML/06-Unsupervised-Learning/02-Face-Recognition/tests/classification.pickle: (name)
- 05-ML/06-Unsupervised-Learning/02-Face-Recognition/tests/components.pickle: (name)
- 05-ML/06-Unsupervised-Learning/02-Face-Recognition/tests/projection.pickle: (name)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/datasets.pickle: (name)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/feature_engineering.pickle: (name)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/metadata.pickle: (name)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/ratings.pickle: (name)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/recommender.pickle: (name, best_similarity)
- 05-ML/07-Ensemble-Methods/01-Houses-Kaggle-Competition/tests/features_overview.pickle: (name)
- 05-ML/07-Ensemble-Methods/01-Houses-Kaggle-Competition/tests/preproc_baseline.pickle: (name)
- 05-ML/07-Ensemble-Methods/01-Houses-Kaggle-Competition/tests/submission_baseline.pickle: (name, submission_dtypes)
- 05-ML/08-Workflow/01-Preprocessor-Tuning/tests/solution.pickle: (name)
- 05-ML/08-Workflow/02-Custom-Transformer/tests/pipe.pickle: (name)
- 05-ML/08-Workflow/02-Custom-Transformer/tests/prediction.pickle: (name)
- 05-ML/08-Workflow/03-Tuning-Pipeline/tests/solution.pickle: (name)
- 05-ML/08-Workflow/04-Pickle-Pipe/tests/solution.pickle: (name)
- 05-ML/08-Workflow/05-Hand-Made-Standardizer/tests/standardizer.pickle: (name)
- 06-Deep-Learning/01-Fundamentals-of-Deep-Learning/02-Your-first-Neural-Network/tests/deeper_model.pickle: (name)
- 06-Deep-Learning/01-Fundamentals-of-Deep-Learning/02-Your-first-Neural-Network/tests/first model.pickle: (name)
- 06-Deep-Learning/01-Fundamentals-of-Deep-Learning/02-Your-first-Neural-Network/tests/first_model.pickle: (name)
- 06-Deep-Learning/01-Fundamentals-of-Deep-Learning/03-Multiclass-classification/tests/baseline.pickle: (name)
- 06-Deep-Learning/02-Optimizer-loss-and-fitting/03-Finetune-your-Neural-Network/tests/solution.pickle: (name)
- 06-Deep-Learning/02-Optimizer-loss-and-fitting/04-Credit-Card-Challenge/tests/solution.pickle: (name)

Pickles containing ndarray:
- 02-Data-Toolkit/01-Data-Analysis/02-Numpy/tests/numpy.pickle: (ten, from_five, A, B, lin_twenty, C, E, F, reshaped_G, hi_sum)
- 03-Maths/01-Algebra-Calculus/01-real-estate-estimator/tests/features.pickle: (features)
- 03-Maths/01-Algebra-Calculus/01-real-estate-estimator/tests/solution.pickle: (theta)
- 03-Maths/01-Algebra-Calculus/01-real-estate-estimator/tests/target.pickle: (target)
- 03-Maths/01-Algebra-Calculus/02-real-estate-advanced-estimator/tests/univariate.pickle: (squared_errors)
- 03-Maths/01-Algebra-Calculus/03-Scipy/tests/global_optimization.pickle: (Xmin_shgo, Xmin_dual)
- 03-Maths/01-Algebra-Calculus/03-Scipy/tests/minimize_constraints.pickle: (X0, Xmin)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/trained_model.pickle: (slope)
- 05-ML/03-Performance-metrics/01-KNN/tests/scale_sensitivity.pickle: (rescaled_features)
- 05-ML/03-Performance-metrics/03-Threshold-Adjustments/tests/scaled_features.pickle: (scaled_features)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/blobs.pickle: (lower_centroid)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/two_means.pickle: (clusters)
- 05-ML/08-Workflow/02-Custom-Transformer/tests/prediction.pickle: (prediction)
- 05-ML/08-Workflow/04-Pickle-Pipe/tests/solution.pickle: (predicted_class)
- 05-ML/08-Workflow/05-Hand-Made-Standardizer/tests/standardizer.pickle: (X_train_transformed, X_test_transformed)

Pickles containing dtype:
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/date.pickle: (month_type)
- 02-Data-Toolkit/02-Data-Sourcing/01-Stock-Market-API/tests/apple.pickle: (index_type)

Pickles containing tuple:
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/filtered_gas.pickle: (yearly_gas)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/full_gas.pickle: (yearly_gas_shape)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/merged_dataframes.pickle: (merged_df_shape)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/oil.pickle: (filtered_oil_shape)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/all_df.pickle: (all_df_shape)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/inner_merge.pickle: (inner_merged_shape)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/left_merge.pickle: (left_merged_shape)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games_combined.pickle: (top_combined_shape)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games_event.pickle: (top_country_event_shape)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games_season.pickle: (top_country_season_shape)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/outer_merge.pickle: (outer_merged_shape)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/right_merge.pickle: (right_merged_shape)
- 02-Data-Toolkit/02-Data-Sourcing/03-Text-Extraction-with-Regex/tests/receipts.pickle: (df_size)
- 03-Maths/01-Algebra-Calculus/02-real-estate-advanced-estimator/tests/flats.pickle: (shape)
- 03-Maths/01-Algebra-Calculus/03-Scipy/tests/minimize2d.pickle: (X0_shape, minimum_shape)
- 03-Maths/01-Algebra-Calculus/03-Scipy/tests/minimize_constraints.pickle: (bounds)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/get_wait_time.pickle: (shape)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/number_products.pickle: (shape)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/number_sellers.pickle: (shape)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/price.pickle: (shape)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/review_score.pickle: (shape)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/training.pickle: (shape)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/wait_time.pickle: (shape)
- 04-Decision-Science/03-Linear-Regression/02-Sellers/tests/seller.pickle: (shape)
- 04-Decision-Science/03-Linear-Regression/Optional-Products/tests/products.pickle: (shape)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/blobs.pickle: (shape)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/image_shape.pickle: (img_shape)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/imageshape.pickle: (img_shape)
- 05-ML/06-Unsupervised-Learning/02-Face-Recognition/tests/projection.pickle: (shape)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/datasets.pickle: (movies_shape, tags_shape, ratings_shape)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/metadata.pickle: (counter_shape, latent_shape)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/ratings.pickle: (latent_shape)
- 05-ML/07-Ensemble-Methods/01-Houses-Kaggle-Competition/tests/preproc_baseline.pickle: (shape)
- 05-ML/07-Ensemble-Methods/01-Houses-Kaggle-Competition/tests/submission_baseline.pickle: (submission_shape)
- 05-ML/08-Workflow/02-Custom-Transformer/tests/pipe.pickle: (shape)

Pickles containing int64:
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/full_gas.pickle: (index_year, us_total)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/merged_dataframes.pickle: (yearly_oil_2009)
- 02-Data-Toolkit/01-Data-Analysis/03-US-Oil-and-Gas-Production/tests/oil.pickle: (filtered_oil_index_year, us_total)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games.pickle: (top_country_1, top_country_10)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games_event.pickle: (top_country_1_event, top_country_10_event)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games_season.pickle: (top_country_1_summer, top_country_10_winter)
- 02-Data-Toolkit/02-Data-Sourcing/02-Scraping/tests/books.pickle: (rating)
- 04-Decision-Science/01-Project-Setup/03-Exploratory-Analysis/tests/exploratory.pickle: (n)
- 05-ML/01-Fundamentals-of-Machine-Learning/04-Logistic-Regression/tests/predictions.pickle: (prediction)
- 05-ML/03-Performance-metrics/02-Electrocardiograms/tests/class_balance.pickle: (healthy, at_risk)
- 05-ML/04-Under-the-hood/02-Solvers/tests/new_data_prediction.pickle: (predicted_class)
- 05-ML/05-Model-Tuning/01-Workflow/tests/knn.pickle: (best_k)

Pickles containing set:
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/all_df.pickle: (all_df_columns)

Pickles containing int:
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/inner_merge.pickle: (inner_merged_nulls)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/left_merge.pickle: (left_merged_nulls)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/outer_merge.pickle: (outer_merged_nulls)
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/right_merge.pickle: (right_merged_nulls)
- 03-Maths/02-Statistics-Probabilities/01-Law-of-Large-Numbers/tests/expected_value_coins.pickle: (expected_value_coins)
- 03-Maths/02-Statistics-Probabilities/02-Random-Variables/tests/factorial.pickle: (count_total_possibilities_10)
- 03-Maths/02-Statistics-Probabilities/02-Toss-a-Coin/tests/factorial.pickle: (count_total_possibilities_10)
- 03-Maths/02-Statistics-Probabilities/03-Central-Limit-Theorem/tests/probability.pickle: (n)
- 04-Decision-Science/01-Project-Setup/02-Data-Preparation/tests/get_data.pickle: (keys_len)
- 04-Decision-Science/01-Project-Setup/Optional-Metric-Design/tests/orders.pickle: (keys_len, reviews_number)
- 04-Decision-Science/03-Linear-Regression/Optional-Products/tests/products.pickle: (avg_review_score, avg_price, avg_quantity)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/variables.pickle: (variable_y)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/duplicates.pickle: (duplicates)
- 05-ML/02-Prepare-the-dataset/02-Feature-Selection/tests/correlation.pickle: (correlated_features)
- 05-ML/03-Performance-metrics/01-KNN/tests/optimal_k.pickle: (optimal_k)
- 05-ML/05-Model-Tuning/03-SVM/tests/generalization.pickle: (number_misclassified_test)
- 05-ML/05-Model-Tuning/03-SVM/tests/rbf_svm.pickle: (best_c, best_gamma)
- 05-ML/05-Model-Tuning/03-SVM/tests/svm_rbf.pickle: (best_c, best_gamma)
- 05-ML/06-Unsupervised-Learning/01-Image-Compression/tests/color_count.pickle: (color_count)
- 05-ML/06-Unsupervised-Learning/02-Face-Recognition/tests/classification.pickle: (best_pc)
- 05-ML/06-Unsupervised-Learning/02-Face-Recognition/tests/components.pickle: (min_pc)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/feature_engineering.pickle: (merged_df_rows)
- 05-ML/07-Ensemble-Methods/01-Houses-Kaggle-Competition/tests/features_overview.pickle: (n)
- 05-ML/08-Workflow/01-Preprocessor-Tuning/tests/solution.pickle: (n_best)
- 06-Deep-Learning/02-Optimizer-loss-and-fitting/04-Credit-Card-Challenge/tests/solution.pickle: (fraud_number, non_fraud_number)

Pickles containing float64:
- 02-Data-Toolkit/01-Data-Analysis/04-Multiple-Files-With-Pandas/tests/olympic_games_combined.pickle: (top_combined_1_event, top_combined_1_medal, top_combined_10_event, top_combined_10_medal)
- 02-Data-Toolkit/02-Data-Sourcing/02-Scraping/tests/books.pickle: (price)
- 03-Maths/01-Algebra-Calculus/02-real-estate-advanced-estimator/tests/univariate.pickle: (mse, theta1, theta0)
- 03-Maths/02-Statistics-Probabilities/03-Central-Limit-Theorem/tests/probability.pickle: (sigma_expected, proba)
- 03-Maths/02-Statistics-Probabilities/03-Central-Limit-Theorem/tests/zscore.pickle: (z, proba)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/cv_results.pickle: (min_score, max_score, mean_score)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/prediction.pickle: (prediction)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/trained_model.pickle: (intercept)
- 05-ML/01-Fundamentals-of-Machine-Learning/02-Learning-Curves/tests/added_features.pickle: (score)
- 05-ML/01-Fundamentals-of-Machine-Learning/02-Learning-Curves/tests/cv_score.pickle: (score)
- 05-ML/01-Fundamentals-of-Machine-Learning/02-Learning-Curves/tests/prediction.pickle: (prediction)
- 05-ML/01-Fundamentals-of-Machine-Learning/02-Learning-Curves/tests/reduced_dataset.pickle: (score)
- 05-ML/01-Fundamentals-of-Machine-Learning/03-Problems/tests/problem_1.pickle: (answer)
- 05-ML/01-Fundamentals-of-Machine-Learning/03-Problems/tests/problem_2.pickle: (answer)
- 05-ML/01-Fundamentals-of-Machine-Learning/04-Logistic-Regression/tests/accuracy.pickle: (accuracy)
- 05-ML/01-Fundamentals-of-Machine-Learning/04-Logistic-Regression/tests/predictions.pickle: (probability)
- 05-ML/01-Fundamentals-of-Machine-Learning/04-Logistic-Regression/tests/reduced_accuracy.pickle: (accuracy)
- 05-ML/02-Prepare-the-dataset/02-Feature-Selection/tests/base_model.pickle: (score)
- 05-ML/02-Prepare-the-dataset/02-Feature-Selection/tests/reduced_complexity_model.pickle: (model_score)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/base_model.pickle: (score)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/strong_model.pickle: (score)
- 05-ML/03-Performance-metrics/01-KNN/tests/default_score.pickle: (score)
- 05-ML/03-Performance-metrics/01-KNN/tests/price_error.pickle: (error)
- 05-ML/03-Performance-metrics/01-KNN/tests/scale_sensitivity.pickle: (base_score, rescaled_score)
- 05-ML/03-Performance-metrics/02-Electrocardiograms/tests/logistic_regression_evaluation.pickle: (accuracy, recall, precision, f1)
- 05-ML/03-Performance-metrics/03-Threshold-Adjustments/tests/base_precision.pickle: (score)
- 05-ML/04-Under-the-hood/01-Loss-Functions/tests/loss_functions.pickle: (r2, r2_mae, max_error, max_error_mae)
- 05-ML/04-Under-the-hood/02-Solvers/tests/new_data_prediction.pickle: (predicted_proba_of_class)
- 05-ML/04-Under-the-hood/03-Batch-Gradient-Descent/tests/descent.pickle: (a_100, b_100)
- 05-ML/05-Model-Tuning/01-Workflow/tests/knn.pickle: (best_score)
- 05-ML/05-Model-Tuning/01-Workflow/tests/r2.pickle: (r2_test)
- 05-ML/05-Model-Tuning/03-SVM/tests/sigmoid_svm.pickle: (sigmoid_svm_cv_accuracy)
- 05-ML/07-Ensemble-Methods/01-Houses-Kaggle-Competition/tests/submission_baseline.pickle: (score_baseline)
- 05-ML/08-Workflow/01-Preprocessor-Tuning/tests/solution.pickle: (cv_score)

Pickles containing Index:
- 02-Data-Toolkit/02-Data-Sourcing/00-Warmup/tests/warmup.pickle: (df_columns)
- 02-Data-Toolkit/02-Data-Sourcing/01-Stock-Market-API/tests/apple.pickle: (columns)
- 02-Data-Toolkit/02-Data-Sourcing/02-Scraping/tests/books.pickle: (columns)
- 03-Maths/01-Algebra-Calculus/02-real-estate-advanced-estimator/tests/flats.pickle: (columns)

Pickles containing dict:
- 02-Data-Toolkit/02-Data-Sourcing/02-Scraping/tests/books.pickle: (books_dict)
- 02-Data-Toolkit/02-Data-Sourcing/03-Text-Extraction-with-Regex/tests/receipts.pickle: (receipts)
- 03-Maths/02-Statistics-Probabilities/02-Random-Variables/tests/factorial.pickle: (probability_1, probability_100)
- 03-Maths/02-Statistics-Probabilities/02-Toss-a-Coin/tests/factorial.pickle: (probability_1, probability_100)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/cv_results.pickle: (cv_result)

Pickles containing list:
- 02-Data-Toolkit/02-Data-Sourcing/03-Text-Extraction-with-Regex/tests/receipts.pickle: (raw)
- 03-Maths/01-Algebra-Calculus/03-Scipy/tests/global_optimization.pickle: (bounds)
- 04-Decision-Science/01-Project-Setup/02-Data-Preparation/tests/get_data.pickle: (keys, columns)
- 04-Decision-Science/01-Project-Setup/Optional-Metric-Design/tests/orders.pickle: (key_names)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/training.pickle: (columns)
- 04-Decision-Science/04-Logistic-Regression/01-Logit/tests/logit.pickle: (answers)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/encoding.pickle: (new_features)
- 05-ML/05-Model-Tuning/02-Regularization/tests/lasso.pickle: (zero_impact_features)
- 05-ML/05-Model-Tuning/02-Regularization/tests/ridge.pickle: (top_2)
- 05-ML/05-Model-Tuning/02-Regularization/tests/unregularized.pickle: (top_1_feature)
- 05-ML/05-Model-Tuning/03-SVM/tests/poly_svm.pickle: (poly_svm_performance)
- 05-ML/07-Ensemble-Methods/01-Houses-Kaggle-Competition/tests/submission_baseline.pickle: (submission_columns)

Pickles containing Series:
- 02-Data-Toolkit/02-Data-Sourcing/03-Text-Extraction-with-Regex/tests/receipts.pickle: (receipt)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/datasets.pickle: (genres_cleaned)

Pickles containing float:
- 03-Maths/02-Statistics-Probabilities/01-Law-of-Large-Numbers/tests/expected_value_dice.pickle: (expected_value)
- 03-Maths/02-Statistics-Probabilities/02-Random-Variables/tests/factorial.pickle: (count_possibilities_11, count_possibilities_43)
- 03-Maths/02-Statistics-Probabilities/02-Toss-a-Coin/tests/factorial.pickle: (count_possibilities_11, count_possibilities_43)
- 03-Maths/02-Statistics-Probabilities/03-Central-Limit-Theorem/tests/distribution.pickle: (mu, sigma)
- 03-Maths/02-Statistics-Probabilities/03-Central-Limit-Theorem/tests/probability.pickle: (mu_expected, mu, sigma)
- 04-Decision-Science/02-Statistical-Inference/01-Orders/tests/distance.pickle: (mean)
- 04-Decision-Science/03-Linear-Regression/02-Sellers/tests/seller.pickle: (median)
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/variables.pickle: (variable_X)
- 05-ML/03-Performance-metrics/02-Electrocardiograms/tests/precision.pickle: (precision)
- 05-ML/03-Performance-metrics/03-Threshold-Adjustments/tests/decision_threshold.pickle: (threshold)
- 05-ML/05-Model-Tuning/03-SVM/tests/linear_svm.pickle: (linear_svm_score)
- 06-Deep-Learning/01-Fundamentals-of-Deep-Learning/02-Your-first-Neural-Network/tests/deeper_model.pickle: (accuracy)
- 06-Deep-Learning/01-Fundamentals-of-Deep-Learning/02-Your-first-Neural-Network/tests/first model.pickle: (accuracy)
- 06-Deep-Learning/01-Fundamentals-of-Deep-Learning/02-Your-first-Neural-Network/tests/first_model.pickle: (accuracy)
- 06-Deep-Learning/01-Fundamentals-of-Deep-Learning/03-Multiclass-classification/tests/baseline.pickle: (accuracy)
- 06-Deep-Learning/02-Optimizer-loss-and-fitting/03-Finetune-your-Neural-Network/tests/solution.pickle: (mae_test)
- 06-Deep-Learning/02-Optimizer-loss-and-fitting/04-Credit-Card-Challenge/tests/solution.pickle: (precision, recall)

Pickles containing LinearRegression:
- 05-ML/01-Fundamentals-of-Machine-Learning/01-Linear-Regression/tests/trained_model.pickle: (model)

Pickles containing DataFrame:
- 05-ML/01-Fundamentals-of-Machine-Learning/02-Learning-Curves/tests/reduced_dataset.pickle: (dataset)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/duplicates.pickle: (dataset)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/encoding.pickle: (dataset)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/missing_values.pickle: (dataset)
- 05-ML/02-Prepare-the-dataset/01-Preprocessing-Workflow/tests/scaling.pickle: (dataset)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/collinearity.pickle: (dataset)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/encoding.pickle: (dataset)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/missing_values.pickle: (dataset)
- 05-ML/02-Prepare-the-dataset/03-Car-Prices/tests/scaling.pickle: (dataset)
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/feature_engineering.pickle: (metadata)

Pickles containing bool_:
- 05-ML/06-Unsupervised-Learning/03-Movie-Recommendation/tests/feature_engineering.pickle: (unique_movies)

Pickles containing RandomizedSearchCV:
- 05-ML/08-Workflow/03-Tuning-Pipeline/tests/solution.pickle: (search)

gmanchon commented 2 years ago

@krokrob submits that it might be convenient for the content writers to test the pickled results using data science dedicated data types (pandas)

with this in mind, a possible suggested approach is to add steps around the data serialisation :

encode from package specific (pandas) data types to python standard types when saving the result
if we want to be extra safe, replace pickle with json for storage/transmission (need to evaluate the impact)
decode when loading the result

if we allow the content writers to use pandas data types (ndarray, series, index, dataframe), all the existing data-solutions pickles will be covered except from the ones storing a LinearRegression or a RandomizedSearchCV, which we should update

ssaunier commented 1 year ago

On year later, what's the appetite on that one @krokrob @gmanchon ?

gmanchon commented 1 year ago

Oh I'm sure this is not a priority, let's close this 👌

lewagon / nbresult

pickle renders the student and glovebox setups tightly coupled #4