hupili / python-for-data-and-media-communication-gitbook

An open source book on Python tailed for communication students with zero background
117 stars 62 forks source link

selenium click button to select categories and search #125

Open ZhangNingNina opened 5 years ago

ZhangNingNina commented 5 years ago

Troubleshooting

Describe your environment

Describe your question

请问如何抓取浮窗内容呢?(是鼠标移动到某个位置才会出现的浮窗,如果将鼠标移开那个位置浮窗就消失了) 这种浮窗内容在F12的源码中是没有的,所以不知道怎么抓取…… Could someone helps me! Thanks so much! @hupili @ChicoXYC

The minimum code (snippet) to reproduce the issue

Describe the efforts you have spent on this issue

我百度过后,基本只有关于弹出窗口内容爬取的解决方案,没有找到相类似的解决方案

ChicoXYC commented 5 years ago

@ZhangNingNina can you give me the example code of what you want to scrape?

ZhangNingNina commented 5 years ago

@ZhangNingNina can you give me the example code of what you want to scrape?

Sure. For example, the website here: https://www.zhipin.com/?sid=sem_pz_bdpc_dasou_title I was wondering how to scrape those industry categories on the left side of the webpage in details? It's hard to find out the source code because the detailed information appears only when i hover the cursor over it.

ChicoXYC commented 5 years ago

@ZhangNingNina hope this may solve your problem.https://github.com/ChicoXYC/exercise/blob/master/boss-%E7%9B%B4%E8%81%98/boss%E7%9B%B4%E8%81%98.ipynb

ZhangNingNina commented 5 years ago

@ZhangNingNina hope this may solve your problem.https://github.com/ChicoXYC/exercise/blob/master/boss-%E7%9B%B4%E8%81%98/boss%E7%9B%B4%E8%81%98.ipynb

Thanks a lot!

iiiJenny commented 5 years ago

学长,这个办法貌似不适用。我们遇到的困难是,某个信息需要鼠标移到某一点才出现信息框,比如: image 但是,这个信息没有办法点击出来,网页代码也找不到诶 image @ChicoXYC

ChicoXYC commented 5 years ago

@iiiJenny in that case, I think we can use another way to scrape.

screen shot 2018-12-02 at 11 28 11 pm screen shot 2018-12-02 at 11 28 41 pm

the sub-categories' urls in one father category increase by integers. You can formate those urls.

hupili commented 5 years ago

Does this solve the hover issue: https://stackoverflow.com/a/8261754/2446356 ?

ChicoXYC commented 5 years ago

@ZhangNingNina have you solved the problem? One solution is that you can format those sub-category links. 001 Like the above example: the link of java is https://www.zhipin.com/c101010100-p100101/ and the link of c++ is https://www.zhipin.com/c101010100-p100102/ you can find that, only the last number is different, which indicates we can format all the urls by this method.

Also please let me know whether the above method @hupili gave worked or not. Thanks @ZhangNingNina

ZhangNingNina commented 5 years ago

Sorry for my late reply. We've tried it and found this method worked. Thank you so much!!!