jayzeng / scrapy-elasticsearch

A scrapy pipeline which send items to Elastic Search server
327 stars 88 forks source link

Elasticsearch pipeline not enabled - Scrapy 1.3.3 / ES 5.2 #57

Closed Beenhakker closed 7 years ago

Beenhakker commented 7 years ago

Hi,

I’m trying to integrate Elasticsearch with Scrapy. I’ve followed the steps from https://github.com/knockrentals/scrapy-elasticsearch, but it’s not loading the pipeline. Im using Scrapy 1.3.3 with Elasticsearch 5.2.

Logging: INFO: Enabled item pipelines: []

My settings.py is as follows:

ITEM_PIPELINES = { 'scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline': 500 }

ELASTICSEARCH_SERVERS = ['http://172.17.0.2:9200'] ELASTICSEARCH_INDEX = 'scrapy'

ELASTICSEARCH_INDEX_DATE_FORMAT = '%Y-%m'

ELASTICSEARCH_INDEX_DATE_FORMAT = '%A %d %B %Y' ELASTICSEARCH_TYPE = 'items' ELASTICSEARCH_UNIQ_KEY = 'url' # Custom uniqe key —

Am I missing something? Do you need to define the Pipeline in pipelines.py? The “dirbot” example didn’t.

jayzeng commented 7 years ago

It is clear the plugin didn't load (per INFO: Enabled item pipelines: []), have you installed the plugin (https://github.com/knockrentals/scrapy-elasticsearch#install)?

Beenhakker commented 7 years ago

Hi Jay,

Yes, I followed those steps:

pip3 list

appdirs (1.4.3) asn1crypto (0.22.0) attrs (16.3.0) Automat (0.5.0) cffi (1.9.1) colorlog (2.10.0) constantly (15.1.0) cryptography (1.8.1) cssselect (1.0.1) elasticsearch (5.2.0) frontera (0.7.1) idna (2.5) incremental (16.10.1) lxml (3.7.3) packaging (16.8) parsel (1.1.0) pip (9.0.1) pyasn1 (0.2.3) pyasn1-modules (0.0.8) pycparser (2.17) PyDispatcher (2.0.5) pyOpenSSL (16.2.0) pyparsing (2.2.0) python-json-logger (0.1.7) queuelib (1.4.2) Scrapy (1.3.3) scrapyd (1.1.1) ScrapyElasticSearch (0.8.9) selenium (3.3.1) service-identity (16.0.0) setuptools (34.3.2) six (1.10.0) Twisted (17.1.0) urllib3 (1.20) w3lib (1.17.0) wheel (0.29.0) zope.interface (4.3.3)

jayzeng commented 7 years ago

hmm, it is indeed installed. Your settings.py looks fine to me, can you make sure settings.py is loaded?

Beenhakker commented 7 years ago

Afaik settings.py is always loaded by Scrapy and I checked it with:

root@5d52e8462fc2:/AH# scrapy settings --get=ITEM_PIPELINES {"scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline": 500}

jayzeng commented 7 years ago

@BasEnem sorry for the late response, I really have no clue why the plugin doesn't load. If you figure it out, please do share here so others can benefit from it.

Beenhakker commented 7 years ago

Hi,

That's okay, I was busy myself anyway. The problem is solved, I however do not have a straight answer.

I checked my settings explicitly with the dirbot example settings, and the itemspipeline still was not loaded. I turned it around; I used the dirbot example as starting point, copied my items class and spider to the appropriate files and now the pipeline loads.

Weird.