rasbt / python-machine-learning-book

The "Python Machine Learning (1st edition)" book code repository and info resource
MIT License
12.18k stars 4.39k forks source link

the Breast Cancer Wisconsin dataset is not available #2

Closed nomuramasahir0 closed 8 years ago

nomuramasahir0 commented 8 years ago

In chapter6, the Breast Cancer Wisconsin dataset is not available now. Maybe it is broken link.

currently

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)

should be

df = pd.read_csv('http://mlr.cs.umass.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)

I'm sorry if I'm wrong.

cligini commented 8 years ago

You are definitely right Masahiro Nomura, it appears the top level site is not responding to pings at all from the looks of it. Both encrypted and unencrypted channels are not responding. More digging can be done though. It may just be that it does not respond to that protocol.

On Mon, Jan 25, 2016 at 11:42 AM, Masahiro Nomura notifications@github.com wrote:

In chapter6, the Breast Cancer Wisconsin dataset is not available now. Maybe it is broken link.

currently

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)

should be

df = pd.read_csv('http://mlr.cs.umass.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)

I'm sorry if I'm wrong.

— Reply to this email directly or view it on GitHub https://github.com/rasbt/python-machine-learning-book/issues/2.

rasbt commented 8 years ago

Thanks for the @gash717 Like @cligini mentioned, there seems to be a (hopefully temporary) issue with UCI's ML repository. For cases like that, I put the datasets into the https://github.com/rasbt/python-machine-learning-book/tree/master/code/datasets directory here on GitHub.

So, in order to be able to load the dataset, you could do

df = pd.read_csv('https://raw.githubusercontent.com/rasbt/python-machine-learning-book/master/code/datasets/wdbc/wdbc.data', header=None)

I will add a note in the notebooks to mention it to the users/readers.

Thanks!

rasbt commented 8 years ago

Okay, I added notes to the respective IPython notebooks so that people can fetch the datasets from this GitHub repo until the UCI dataset repository are resolved.

The links to the datasets are:

Thanks again for pointing it out!

rasbt commented 8 years ago

Just checked back on this issue, and seems that UCI's Machine Learning Repository is back online. So, the datasets should be accessible via their original links again (e.g., https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data). However, it's definitely not a bad thing to have the alternative links in the IPython notebooks now ;)

nomuramasahir0 commented 8 years ago

I was able to confirm it now. Thank you very much :-D