hupili / python-for-data-and-media-communication-gitbook

An open source book on Python tailed for communication students with zero background
115 stars 62 forks source link

How to scrape all urls in one article #162

Open MindyZHAOMinzhu opened 5 years ago

MindyZHAOMinzhu commented 5 years ago

My environment

My question

I study the chapter 7 of the python notebook. When I learn how to scrape all urls in one article, I cannot understand the code in the notebook. And I find there is no further explanation about why we use this code to solve the problem.

屏幕快照 2019-07-28 下午2 41 01

Describe the efforts you have spent on this issue

Google/Try to find some useful video online But I do not know how to search for related answers. And I cannot find post__title-link in the chrome developer. I am confused about it.

ConnorLi96 commented 5 years ago

post__title-link is an attribute in the website you scrawl, which is hidden in the html, you need chrome developer tool to check it.

find_all is a function in requests library, when you feel confused for the modules, the best way is to search the solution in google, such as this link, because all the problems you face definitely are solved by other developers.

If you still cannot understand the usage of this module, then just search the official documents or publish the issue in GitHub.

MindyZHAOMinzhu commented 5 years ago

Thank you a lot!!! I get it!!! At first, I did not find the proper attribute in the html. It seems I did not go through the right website!! I will try to find solution when I face the problem again.