Debugging spider - Githubissues

So now it crawls from a text file, and stuff like bad parsing for css/html/js doesn't trip it up. I ran it for about 20 minutes and it was doing well. I also made it so that it stores flat files with up to 100 characters from the url, rather than unlimited, because I was getting disk errors about filenames being too long.

I left the code for parsing js and saving it because it works -- we can remove it anytime pretty easily. I'm only wondering since we have this functionality whether it's considered valuable data, cuz if in the future we want the JS we have it now?

Also added docstrings to the spider

mozilla / spade

Debugging spider #7