nityansuman / lazypredict-nightly

Lazy Predict 2.0 to help you benchmark models without much code and understand what works better without any hyyper-parameter tuning.
MIT License
16 stars 0 forks source link

README: use another dataset #4

Open BradKML opened 1 month ago

BradKML commented 1 month ago

Got this error from the newest version of Scikit-Learn

ImportError: 
`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
...
[2] Harrison Jr, David, and Daniel L. Rubinfeld.
"Hedonic housing prices and the demand for clean air."
Journal of environmental economics and management 5.1 (1978): 81-102.
<[https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air>](https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air%3E)
BradKML commented 1 month ago

Testing with the diabetes dataset with datasets.load_diabetes() and there are some error popping up

100%|██████████| 42/42 [00:04<00:00,  8.45it/s]
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000276 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 640
[LightGBM] [Info] Number of data points in the train set: 397, number of used features: 10
[LightGBM] [Info] Start training from score 151.722922
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
...

And if I use datasets.fetch_california_housing() then this happens instead, seems like they really want force_col_wise=true

100%|██████████| 42/42 [04:59<00:00,  7.14s/it]
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001478 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1838
[LightGBM] [Info] Number of data points in the train set: 18576, number of used features: 8
[LightGBM] [Info] Start training from score 2.063611