Closed 18426425 closed 5 years ago
string processing is needed. Try .split , in , endswith , etc
@18426425 For this case, I think the format of time is fixed, with 10 character length. Therefore, we can use list slicing to separate them.
@18426425 , after the Fri night debugging, is the problem solved? If yes, please link to your solution before we close issue so others can reference to it. You can put the URL of the notebook here, or you can directly paste the key code in fenced code blocks.
Yes, the problem has already solved under Pili's concise method, here are his codes: (starts from line 34 ) http://localhost:8888/notebooks/assignment1%20test.ipynb
Amber, we can not visit that URL. That is on your own computer. (localhost)
Can you put it on GitHub and paste the GitHub URL for others reference?
https://github.com/18426425/myexercise/blob/master/Assignment1%20Maoyan%20Movietest.ipynb
and I tried to find another way( by using "if...else", I counted the length of the shortest one and found 15 is the boundary value between "with country" and "without country")(see line 67), but I failed again...and still I don't think there is sth. wrong...
def get_country(x): if len(time_countries) > 15: return x.split(':')[1].split('(')[1][:-1] else: return''
df['country'] = df['time_countries'].apply(get_country) df
good try.
you can try to split the string first and test the boundary condition whether there are two fields or not
@18426425 have you solved the problem?
Yes, thanks a lot, I have already solved the problem, "if...else" works, only if I delete a line of for loop: def get_country(x): if len (x) > 15: return x.split('(')[1].split(')')[0] else: return'' df['country'] = df['time_countries'].apply(get_country) df
and here is the whole codes: https://github.com/18426425/myexercise/blob/master/Assignment1%20Maoyan%20Movietest.ipynb
Troubleshooting
Describe your environment
Describe your question
When scraping http://maoyan.com/board/4?offset=0, I want to slice the two elements: "release time" and "country" out respectively, but some movies show their country, some don't . for instance some are like this :
上映时间:1993-01-01
while some :
上映时间:1994-10-14(美国)
So my question is how to slice the "two" elements? btw, it seems that they belong to one string tightly.