wo / paperscraper

tracking and parsing new philosophy papers on the internet
9 stars 4 forks source link

blogpostparser crashes parsing empty post #93

Open wo opened 8 years ago

wo commented 8 years ago
2016-09-02 22:01:28 fetching blog post http://kazez.blogspot.com/2016/09/houston-we-have-cover.html
2016-09-02 22:01:29 no content found!
Traceback (most recent call last):
  File "bin/scraperdaemon.py", line 70, in <module>
    daemon.start()
  File "bin/scraperdaemon.py", line 38, in start
    self.run()
  File "bin/scraperdaemon.py", line 45, in run
    blogpostprocessor.run()
  File "/home/wo/opp-tools/bin/../opp/blogpostprocessor.py", line 26, in run
    process_blogpost(post)
  File "/home/wo/opp-tools/bin/../opp/blogpostprocessor.py", line 33, in process_blogpost
    blogpostparser.parse(doc)
  File "/home/wo/opp-tools/bin/../opp/docparser/blogpostparser.py", line 43, in parse
    doc.numwords = len(doc.content.split())
AttributeError: 'NoneType' object has no attribute 'split'