kjam / wswp

Code for the second edition Web Scraping with Python book by Packt Publications
129 stars 98 forks source link

threaded_crawler_with_queue.py ( the updated one ) #12

Open alfredaita opened 5 years ago

alfredaita commented 5 years ago

When running the above script from Chapter04 [ I.E with the updates posted about 7 months prior to this post] at def mp_threaded_crawler(…….): proc.start() produces the following error " Can't Pickle _Thread.lock objects "

windows 10 pro python 3.6.6 gpu 960m

d0tN3t commented 5 years ago

I'm getting the same error as @alfredaita "Can't Pickle _Thread.lock objects" Doing a little bit of troubleshooting it seems to break around the Class RedisQueue.

d0tN3t commented 5 years ago

So I was able to get it up and running using macOS Mojave, Python 3.7, Redis 5.0.5. Not too sure if it might have to do with Windows-Redis combination or Redis v2 on Windows that's causing the bug???

Screen Shot 2019-07-11 at 5 31 08 PM
d0tN3t commented 4 years ago

@alfredaita Figured it out by stumbling to a solution here https://www.codeproject.com/Questions/5205661/Cant-pickle-thread-lock-objects-error-when-object

"RedisCache()" is the only thing that isn't a string or int, so when I removed it everything else was able to pickle. It only took me a year...go figure 😪

cpa

HHB768 commented 4 years ago

Finally!!!! I delete cache=RedisCache() as @d0tN3t said, and delete command 'element = [e for e in element if not self.already_seen(e)]' in Classe RedisQueue. Don't ask me why, it just ... works!

d0tN3t commented 4 years ago

@HHB768 Very cool! I'm so glad the post was able to help you. I'm not sure why you would need to get rid of the element = [e for e in element if not self.already_seen(e)] snippet. But hey if it works then more power to you! 🙏

HHB768 commented 4 years ago

@d0tN3t Thx, if I keep that snippet, the 'redis.exceptions.ResponseError: wrong number of arguments for 'lpush' command' occurs, I wonder if my VPNs cause this command to fail since with some of them, no error occurs but i get wrong result. Maybe they block some addresses... Im new to redis... but if i get rid of that line, it just works...

d0tN3t commented 4 years ago

@HHB768 I remember getting that error as well. I honestly can't remember how I fixed it. I may have upgraded to Redis 5.0 which can't do on Windows (technically) but if you search Ubuntu on the Microsoft store you can download Linux and it runs seamlessly alongside Windows. Then you can install Redis 5 and connect directly to it. Really does work seamlessly and it's now my default go-to.

That being said, that error you're receiving is due to either you putting 0 elements in or more than one element in. Redis can only take in one element with a push. So it needs to be a list, dictionary or tuple. So if you look at my example in Redis Commander my list contains 2 items (item 0, item1) and each one is considered one item for my "key: value" pair. Because you're using json.dumps json serializes it as 1 string. And when you pull it back out using json.loads it turns it back into a dictionary.

image

Also, you should get Redis Commander as UI so you can visualize whats going on. It's a lifesaver and will help you become more familiar with the layout.

HHB768 commented 4 years ago

Clear instructions for newcommers! Thanks a lot! @d0tN3t