leyuwei / CSDN_Blog2PDF

Convert CSDN Blog to PDF while keeping all codes, images and formulas as they were. (转换CSDN博客到PDF格式,保持代码、图片和公式完全不变,方便打印留档.)
Apache License 2.0
7 stars 1 forks source link

好像处理page有错 #1

Open wdyggh opened 5 years ago

wdyggh commented 5 years ago

有报错

Traceback (most recent call last):
  File "D:\study_software\Anaconda3\envs\pdf\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "D:\study_software\Anaconda3\envs\pdf\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "main.py", line 29, in opt
    proc.getAllBlogContent()
  File "E:\CSDN_Blog2PDF\csdnToPdf.py", line 165, in getAllBlogContent
    next=pagelist.findAll('li')
AttributeError: 'NoneType' object has no attribute 'findAll'
wdyggh commented 5 years ago

没提过几次issue,不懂什么格式 还望见谅。 前面的findall 错误在 查看beautifulsoup文档后发现会自动转换,
findall 报错的情况 在 https://github.com/leyuwei/CSDN_Blog2PDF/blob/master/csdnToPdf.py#L77 对userSoup做try处理后能爬并且转换pdf。

我在环境变量中添加了路径还是找不到wkhtmltopdf,所以强加了路径代码

config = pdfkit.configuration(wkhtmltopdf="C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe")
pdfkit.from_file(destHtml, destPdf, configuration=config, options = {'custom-header' : [('Origin', 'https://blog.csdn.net'),('Referer', 'https://blog.csdn.net'),('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0')],
                                                       'cookie': http_cookie,
                                                       'enable-local-file-access':'',
                                                       'images': ''})
leyuwei commented 5 years ago

十分感谢您的反馈。我已经注意到这个问题的存在,由于近日还需要处理一些别的事情,repo的更新我会择机尽快完成。

再次感谢! ^-^

Yuwei