At this point, request.url is something like http://website.com.
But pathname2url will look for a colon : and require that anything before that is only one letter (since we are dealing with regular paths here, like C:\mypath.
When I removed the call to pathname2url, it worked for me, but I don't know which other cases may break:
Since I can't commit to your project, here are two fixes that I had to made in order to get the scraper to run:
In mirror_spider.py, line 50, there is no check whether the output path is valid. The URL can contain ? characters which causes the script to crash.
Here's my solution, it's just a quick fix and may require elaboration for different characters and Linux/Windows compatibility:
There is another bug in your other project scrapy_wayback_machine which is imported here, that causes a crash.
It's in init.py, line 91:
cdx_url = self.cdx_url_template.format(url=pathname2url(request.url))
At this point, request.url is something like http://website.com. But pathname2url will look for a colon : and require that anything before that is only one letter (since we are dealing with regular paths here, like C:\mypath.
When I removed the call to pathname2url, it worked for me, but I don't know which other cases may break:
cdx_url = self.cdx_url_template.format(url=request.url)