roycehaynes / scrapy-rabbitmq

A RabbitMQ Scheduler for Scrapy
MIT License
85 stars 46 forks source link

Connection parameters not working #5

Open drprabhakar opened 8 years ago

drprabhakar commented 8 years ago

I have given the following in my scrapy settings.py file RABBITMQ_CONNECTION_PARAMETERS = {'host': 'amqp://username:password@rabbitmqserver', 'port':5672} But I am getting the following error: raise exceptions.AMQPConnectionError(error) pika.exceptions.AMQPConnectionError: [Errno 11003] getaddrinfo failed

How can I use rabbit MQ server with my credentials?

rdcprojects commented 8 years ago

I doubt these settings will work in this library. Try passing it pika.credentials.Credentials object. That is how it expects in connection.py

drprabhakar commented 8 years ago

I am not sure how can I pass it through settings.py file. Can you please assist how can I give that pika.credentials.Credentials object in settings.py file?

drprabhakar commented 8 years ago

I have connected to my RabbitMQ using pika.credentials.Credentials object. But I am receiving the following error ''' pika.exceptions.ChannelClosed: (404, "NOT_FOUND - no queue 'multidomain:requests' in vhost '/'") ''' Any suggestion for this?

rdcprojects commented 8 years ago

Can you create the queue manually and give it a try?

drprabhakar commented 8 years ago

I have created a queue 'multidomain' manually in Rabbit MQ and tried, getting the same error.

Do you mean to create the queue from scrapy spider?

rdcprojects commented 8 years ago

The queue is "multidomain:requests".

drprabhakar commented 8 years ago

I tried with queue name as "multidomain:requests" and getting below error in the path "\scrapy_rabbitmq\queue.py" return response.message_count exceptions.AttributeError: 'Method' object has no attribute 'message_count'

It seems that scheduler is not working as expected.

Is there any fix for this?

rdcprojects commented 8 years ago

Try my fork. I've fixed these issues.

drprabhakar commented 8 years ago

I have worked with your fork (rdcprojects/scrapy-rabbitmq) and I run my scrapy spider. For testing my script, I have just crawled a field from a URL and print that.

I am getting the following error " cPickle.BadPickleGet: 116"

Is there anything I have to do with my scrapy spider?

rdcprojects commented 8 years ago

Can you provide full traceback?

drprabhakar commented 8 years ago

2015-10-21 15:21:50+0530 [multidomain] INFO: Spider opened 2015-10-21 15:21:50+0530 [multidomain] DEBUG: Resuming crawl (1 request s scheduled) 2015-10-21 15:21:50+0530 [multidomain] INFO: Crawled 0 pages (at 0 page s/min), scraped 0 items (at 0 items/min) 2015-10-21 15:21:50+0530 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6 023 2015-10-21 15:21:50+0530 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080

2015-10-21 15:21:50+0530 [-] Unhandled Error Traceback (most recent call last): File "C:\Python27\lib\site-packages\scrapy\crawler.py", line 93, in st art self.start_reactor() File "C:\Python27\lib\site-packages\scrapy\crawler.py", line 130, in s tart_reactor reactor.run(installSignalHandlers=False) # blocking call File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 11 92, in run self.mainLoop() File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 12 01, in mainLoop self.runUntilCurrent() --- --- File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 82 4, in runUntilCurrent call.func(_call.args, _call.kw) File "C:\Python27\lib\site-packages\scrapy\utils\reactor.py", line 41, in call return self._func(_self._a, _self._kw) File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 107, in _next_request if not self._next_request_from_scheduler(spider): File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 134, in _next_request_from_scheduler request = slot.scheduler.next_request() File "C:\Python27\lib\site-packages\scrapy_rabbitmq\scheduler.py", lin e 73, in next_request request = self.queue.pop() File "C:\Python27\lib\site-packages\scrapy_rabbitmq\queue.py", line 70 , in pop return self._decode_request(body) File "C:\Python27\lib\site-packages\scrapy_rabbitmq\queue.py", line 29 , in _decode_request return request_from_dict(pickle.loads(encoded_request), self.spider)

    cPickle.BadPickleGet: 116
rdcprojects commented 8 years ago

I think we'll have to dig deeper into the library to make it work. You can get in touch with the folks at IRC channel if you want to continue working on the library. Hope this helps!!

drprabhakar commented 8 years ago

Thanks for the information. Please confirm that whether the URLs in RabbitMQ queue should be in specific format(i.e. The message in the should be like "http://www.domain.com/query" or ["http://www.domain.com/query"] or http://www.domain.com/query

Because I just want to confirm that there should not be any issues in RabbitMQ queue.

rdcprojects commented 8 years ago

You can check the scrapy documentation about how URLs are stored in requests queue. There's some encoding / serialization being used. I'm not completely sure about it.