好像处理page有错 - Githubissues

wdyggh commented 5 years ago

有报错

Traceback (most recent call last):
  File "D:\study_software\Anaconda3\envs\pdf\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "D:\study_software\Anaconda3\envs\pdf\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "main.py", line 29, in opt
    proc.getAllBlogContent()
  File "E:\CSDN_Blog2PDF\csdnToPdf.py", line 165, in getAllBlogContent
    next=pagelist.findAll('li')
AttributeError: 'NoneType' object has no attribute 'findAll'

wdyggh commented 5 years ago

没提过几次issue，不懂什么格式还望见谅。前面的findall 错误在查看beautifulsoup文档后发现会自动转换，
findall 报错的情况在 https://github.com/leyuwei/CSDN_Blog2PDF/blob/master/csdnToPdf.py#L77 对userSoup做try处理后能爬并且转换pdf。

我在环境变量中添加了路径还是找不到wkhtmltopdf，所以强加了路径代码

config = pdfkit.configuration(wkhtmltopdf="C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe")
pdfkit.from_file(destHtml, destPdf, configuration=config, options = {'custom-header' : [('Origin', 'https://blog.csdn.net'),('Referer', 'https://blog.csdn.net'),('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0')],
                                                       'cookie': http_cookie,
                                                       'enable-local-file-access':'',
                                                       'images': ''})

leyuwei commented 5 years ago

十分感谢您的反馈。我已经注意到这个问题的存在，由于近日还需要处理一些别的事情，repo的更新我会择机尽快完成。

再次感谢！ ^-^

Yuwei

leyuwei / CSDN_Blog2PDF

好像处理page有错 #1