hupili / python-for-data-and-media-communication-gitbook

An open source book on Python tailed for communication students with zero background
115 stars 62 forks source link

Network issue - HTTPS certificate #112

Open ChicoXYC opened 5 years ago

ChicoXYC commented 5 years ago

The first two are issues about different os system users. The third is about encoding problem.

load csv from GitHub

Mac users(2) cannot import with the following codes, while Windows(2) users can work.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/hupili/python-for-data-and-media-communication/master/text-analysis/regular_reader_tweets.csv')
print('The length of df is {}'.format(len(df)))
df.head()

Error message:

<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)>

Python version: both 3.6.5 Requests version: both 2.19.1 WI-FI environment: both CVA 808

import stopwords from nltk

Windows users(3) can setup the stopwords with nltk, while Mac users(2) cannot.

import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
stopwords = stopwords.words('english')

Error message:

[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed
[nltk_data]     (_ssl.c:833)>

nltk version: both 3.4

import list of files by os.listdir with encoding error

You can refer here.

hupili commented 5 years ago

CERTIFICATE_VERIFY_FAILED is a common bug signature in CVA. Please do more test and pin down the problem scope. Once confirmed, please add to FAQ.

The 3rd problem has nothing to do with os.listdir. It is related with open file with "text mode" (default), whose encoding is not the same as system default encoding. You need to produce a minimum viable example, i.e. opening that single file leads to the error (even if you don't use listdir)

ChicoXYC commented 5 years ago

after two more windows users and 1 mac users test, the results is still the same.

hupili commented 5 years ago

Which issue do you refer to? There are three issues in OP.

ChicoXYC commented 5 years ago

the first two issues.

hupili commented 5 years ago

@ChicoXYC what is the conclusion? Is CVA the only factor leading to the error?

ChicoXYC commented 5 years ago

the first two issues are related to the network in CVA and different protocols of the different operating system, which may need further discussion once encountered. And the os.listdir issue is related to encoding&decoding.