String Index Error on perfectly normal URLs

Checklist

[X] Same issue has not been opened before.

Expected Behavior

No errors.

Actual Behavior

Seemingly randomly, crawling a url will fail with a

string index out of range

error. There doesn't seem to be anything wrong with the URLs:

http://www.denverpost.com/breakingnews/ci_21119904 https://www.publicintegrity.org/2014/07/15/15037/decades-making-decline-irs-nonprofit-regulation https://cdn.knightlab.com/libs/timeline3/latest/js/timeline-min.js https://github.com/rivermont/spidy/ https://twitter.com/adamwhitcroft

Steps to Reproduce the Problem

Run the crawler.
Wait a few seconds.

What I've tried so far

Raising the error gave the traceback:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "crawler.py", line 260, in crawl_worker
    if link[0] == '/':
IndexError: string index out of range

Specifications

Crawler Version: 1.6.0
Platform: Linux (Ubuntu 16.04 LTS)
Dependency Versions: All latest

rivermont / spidy

String Index Error on perfectly normal URLs #54

Checklist

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

What I've tried so far

Specifications