hupili / python-for-data-and-media-communication-gitbook

An open source book on Python tailed for communication students with zero background

118 stars 62 forks source link

how to slice two elements under the same content when one of them may be None sometimes? #76

Closed 18426425 closed 5 years ago

18426425 commented 6 years ago

Troubleshooting

Describe your environment

Operating system:
Python version: python3
Hardware:
Internet access:
Jupyter notebook or not? [Y/N]: Y
Which chapter of book?:

Describe your question

When scraping http://maoyan.com/board/4?offset=0, I want to slice the two elements: "release time" and "country" out respectively, but some movies show their country, some don't . for instance some are like this :

上映时间：1993-01-01

while some :

上映时间：1994-10-14(美国)

So my question is how to slice the "two" elements? btw, it seems that they belong to one string tightly.

hupili commented 6 years ago

string processing is needed. Try .split , in , endswith , etc

ChicoXYC commented 6 years ago

@18426425 For this case, I think the format of time is fixed, with 10 character length. Therefore, we can use list slicing to separate them.

hupili commented 6 years ago

@18426425 , after the Fri night debugging, is the problem solved? If yes, please link to your solution before we close issue so others can reference to it. You can put the URL of the notebook here, or you can directly paste the key code in fenced code blocks.

18426425 commented 6 years ago

Yes, the problem has already solved under Pili's concise method, here are his codes: (starts from line 34 ) http://localhost:8888/notebooks/assignment1%20test.ipynb

hupili commented 6 years ago

Amber, we can not visit that URL. That is on your own computer. (localhost)

Can you put it on GitHub and paste the GitHub URL for others reference?

18426425 commented 6 years ago

Here is the GitHub URL:

https://github.com/18426425/myexercise/blob/master/Assignment1%20Maoyan%20Movietest.ipynb

and I tried to find another way( by using "if...else", I counted the length of the shortest one and found 15 is the boundary value between "with country" and "without country")(see line 67), but I failed again...and still I don't think there is sth. wrong...

def get_country(x): if len(time_countries) > 15: return x.split('：')[1].split('(')[1][:-1] else: return''

df['country'] = df['time_countries'].apply(get_country) df

hupili commented 6 years ago

good try.

you can try to split the string first and test the boundary condition whether there are two fields or not

ChicoXYC commented 6 years ago

@18426425 have you solved the problem?

18426425 commented 6 years ago

Yes, thanks a lot, I have already solved the problem, "if...else" works, only if I delete a line of for loop: def get_country(x): if len (x) > 15: return x.split('(')[1].split(')')[0] else: return'' df['country'] = df['time_countries'].apply(get_country) df

and here is the whole codes: https://github.com/18426425/myexercise/blob/master/Assignment1%20Maoyan%20Movietest.ipynb

ChicoXYC commented 5 years ago

Closed, merged into notes, please refer here.