hupili / python-for-data-and-media-communication-gitbook

An open source book on Python tailed for communication students with zero background
117 stars 62 forks source link

Feedbacks for Note10 #97

Closed SerenaQYHuang closed 5 years ago

SerenaQYHuang commented 5 years ago

For the following code

string = "../../../blog/20160730-mediawiki-wiki-knowledge-management-system/"
s = string.split('/blog')
#s
s1 = s[-1]
#s1
s2='{0}{1}'.format('http://initiumlab.com',s1)
#s2

What we expected was "http://initiumlab.com/blog/20160730-mediawiki-wiki-knowledge-management-system/" If we use the code listed above, what we actually got would be 'http://initiumlab.com/20160730-mediawiki-wiki-knowledge-management-system/'

"/blog/" should be also added @ChicoXYC

SerenaQYHuang commented 5 years ago
#check certain user name in the text
def check_name(x):
    return 'ten_gop' in str(x).lower()
df['text'].apply(check_name).value_counts()

The output for this piece of code is like the following: image

SerenaQYHuang commented 5 years ago
s_user = df['user_key'].value_counts()
df_users = s_user.reset_index()
#df_users

# count_retweeted_number of all the users with apply function
def count_retweeted_number(name):
    def check_name(x):
        return name in str(x).lower()
    return df['text'].apply(check_name).value_counts().get(True, 0)

df_users['count'] = df_users['index'].apply(count_retweeted_number)
df_users.sort_values(by='count', ascending=False)

Output: image

SerenaQYHuang commented 5 years ago

image

This is weird. Why doesn't it work?

SerenaQYHuang commented 5 years ago

image

SerenaQYHuang commented 5 years ago
processed_word_list = []
#assume you've already get a list of words  
for word in words:
    word = word.lower() # in case they are not all lower cased
    if word not in stopwords:
        processed_word_list.append(word)

image

ChicoXYC commented 5 years ago

@SerenaQYHuang the reason why the error appeared on several examples you listed above is that they are not the real example cases, its just a syntax without pulling data. I've already modified and make them real cases.

ChicoXYC commented 5 years ago
#check certain user name in the text
def check_name(x):
    return 'ten_gop' in str(x).lower()
df['text'].apply(check_name).value_counts()

The output for this piece of code is like the following: image

for this error, it works in my environment, we can discuss more tomorrow.

hupili commented 5 years ago

type(str) to see what you get. @SerenaQYHuang @ChicoXYC , you may have different results.

SerenaQYHuang commented 5 years ago

input: type(str) output: str

image

@ChicoXYC @hupili

SerenaQYHuang commented 5 years ago
_20181123004500

Then I tried adding an r before the url like this: words = read_txt(r"C:\Users\Administrator\Dropbox\Media data analytics\BigDataAnalytics\AppleDaily.txt")

and it turned out to be: image

but I copied the path from windows task manager

hupili commented 5 years ago

Is this comment resolved? It looks like mistakenly assignment to str built-in function.

More here: https://github.com/hupili/python-for-data-and-media-communication-gitbook/blob/master/python-language-basics.md#name-clash-with-reserved-word

ChicoXYC commented 5 years ago

@hupili I guess this is the reason. Because in previous example, i use str as an assignment. I've changed it into other name.