Closed masmangan closed 2 years ago
这是来自QQ邮箱的假期自动回复邮件。你好,我最近正在休假中,无法亲自回复你的邮件。我将在假期结束后,尽快给你回复。
Thank you for the notice @masmangan We'll switch to the CA example following scikit-learn. If you wish to contribute the change as a pull request let me know, else it'll be done in the coming weeks.
@pdebuyl I expect to have at least a partial solution for this issue by next week.
Original Boston example is here: https://colab.research.google.com/drive/1PMos6Zy97IIil8X0de4zinFpuDjRskAd?usp=sharing
A working draft of new CA example is here: https://colab.research.google.com/drive/1KQcOTjKJRUnzAlNpBgTdJMBZHULiVeso?usp=sharing
Not done yet: a) Plotting histogram is not working. b) Features differ, labels need review.
Also, CA RMS is lower (0.5) than Boston RMS (2.5).
Thanks for the work!
plt.scatter(data.data[feature_name], data.target)
did the plot in my test (with your notebook).
The RMS is lower, but so is the average :-) So the RMS / "mean" is probably (by looking at the plots) higher in CA.
The data is less "clean-looking" on the CA side, so the source might have not been processed in the same way. The "MedInc" plot is all over the place with plenty of near-zero revenue entries.
Great! Plot is working now! Thanks!
@pdebuyl This is only a partial solution. Hope it helps!
It does thank you. I am testing the code now. I have had to update scikit-learn and to fix some related deprecations. I'll get it fully fixed later though (coming weeks, summertime).
fixed by #502
Boston housing is deprecated and will be removed from scikit-learn datasets. This would break the example from "3.6. scikit-learn: machine learning in Python", "Supervised Learning: Regression of Housing Data".
Reason: "DEPRECATED: load_boston is deprecated in 1.0 and will be removed in 1.2." "The Boston housing prices dataset has an ethical problem. You can refer to the documentation of this function for further details." "The scikit-learn maintainers therefore strongly discourage the use of this dataset unless the purpose of the code is to study and educate about ethical issues in data science and machine learning." Source: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html
Discussion related to this issue on scikit-learn issues: https://github.com/scikit-learn/scikit-learn/issues/16155 https://github.com/scikit-learn/scikit-learn/pull/20729
This issue affects the following files: https://github.com/scipy-lectures/scipy-lecture-notes/blob/master/packages/scikit-learn/examples/plot_boston_prediction.py https://github.com/scipy-lectures/scipy-lecture-notes/blob/master/packages/scikit-learn/index.rst
Recommendation: a) Develop a plot_california_prediction.py or a plot_ames_prediction.py based on plot_boston_prediction.py b) Update index.rst in order to present plots from this new example c) Keep and update plot_boston_prediction.py to fetch original data from CMU or remove plot_boston_prediction.py
Alternative: Update index.rst in order to address ethical issues regarding Boston housing dataset.