googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.12k stars 691 forks source link

Colab: "Training a Text Classifier Using Embeddings" 403 loading training dataset #4630

Closed ScottS2017 closed 1 week ago

ScottS2017 commented 2 weeks ago

Describe the current behavior

Colab: Training a Text Classifier Using Embeddings

Cell 5 is:

newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')

newsgroups_train.target_names

The result of running the above is:

HTTPError                                 Traceback (most recent call last)
[<ipython-input-6-f3476b42207e>](https://localhost:8080/#) in <cell line: 1>()
----> 1 newsgroups_train = fetch_20newsgroups(subset='train')
      2 newsgroups_test = fetch_20newsgroups(subset='test')
      3 
      4 # View list of class names for dataset
      5 newsgroups_train.target_names

9 frames

...

HTTPError: HTTP Error 403: Forbidden

Describe the expected behavior Load the data without error.

What web browser you are using Chrome: 125.0.6422.142 (Official Build) (64-bit)

mayankmalik-colab commented 2 weeks ago

I don't think this is Colab specific issue. fetch_20newsgroups is part of from sklearn.datasets import fetch_20newsgroups and can be run independently. I can't reproduce this error and is working fine for me.

However, I can see that the issue has been discussed on other github repos like this and happens sporadically. 403s are from their servers. Perhaps, you will have better luck raising on sklearn github repo.

ScottS2017 commented 1 week ago

Got it, thanks!