imwowzer / AN-for-python

学习python过程中的笔记
0 stars 0 forks source link

python爬取简历-1 #7

Open imwowzer opened 5 years ago

imwowzer commented 5 years ago

init

imwowzer commented 5 years ago

一个爬取猎聘网HR版简历的代码

# -*- coding:utf-8 -*- 
import time 
from lxml import etree 
from selenium import webdriver 
from selenium.webdriver.common.keys import Keys 

brower = webdriver.Chrome() 
brower.get("https://passport.liepin.com/e/account")
 # print(brower.page_source) 

brower.save_screenshot("liepinhr.png") 
time.sleep(30) 
user_name = brower.page_source 
selector = etree.HTML(user_name) 
user_name1 = selector.xpath('')[0].text 
print(user_name1)
--------------------- 
作者:python_9k 
来源:CSDN 
原文:https://blog.csdn.net/python_9k/article/details/78907906 
版权声明:本文为博主原创文章,转载请附上博文链接!

错误1

  File "jianli.py", line 15
    user_name1 = selector.xpath(")[0].text
                                                               ^
SyntaxError: EOL while scanning string literal

xpath(")改为xpath("")

错误2

Traceback (most recent call last):
  File "jianli.py", line 5, in <module>
    from selenium.webdriver.common.keys import keys
ImportError: cannot import name 'keys' from 'selenium.webdriver.common.keys' (/usr/local/python3.7/lib/python3.7/site-packages/selenium/webdriver/common/keys.py)

查看keys.py内容发现,keys首字母要大写 image

imwowzer commented 5 years ago

错误3

Traceback (most recent call last):
  File "/usr/local/python3.7/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 76, in start
    stdin=PIPE)
  File "/usr/local/python3.7/lib/python3.7/subprocess.py", line 756, in __init__
    restore_signals, start_new_session)
  File "/usr/local/python3.7/lib/python3.7/subprocess.py", line 1499, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver': 'chromedriver'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "jianli.py", line 7, in <module>
    brower = webdriver.Chrome()
  File "/usr/local/python3.7/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/usr/local/python3.7/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

解决方法

https://www.cnblogs.com/technologylife/p/5829944.html

  1. 下载chromdriver.exe
  2. windows 下,新建一个命名为chromedriver文件夹,将解压的chromedriver.exe放进文件夹,再配置进path环境变量:Linux下,把下载好的文件放在 /usr/bin 目录下就可以了。
imwowzer commented 5 years ago

错误4,把print(brower.page_source)代码放开后

Traceback (most recent call last):
  File "D:\python3test\test.py", line 9, in <module>
    print(brower.page_source)
UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 18297: illegal multibyte sequence

解决方法

https://www.cnblogs.com/Skyda/p/9179963.html

在python中, print()方法在Win7的默认编码是gbk,它在打印时,并不是所有的字符都支持的。
而且这个问题一般也就是在cmd中才会有。 在cmd中是改变标准输出编码:

1 import os,sys,io 2 sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')

imwowzer commented 5 years ago

错误5

Traceback (most recent call last):
  File "D:\python3test\test.py", line 23, in <module>
    user_name1 = selector.xpath("")[0].text
  File "src\lxml\etree.pyx", line 1586, in lxml.etree._Element.xpath
  File "src\lxml\xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__
  File "src\lxml\xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Invalid expression