jayzeng / scrapy-elasticsearch

A scrapy pipeline which send items to Elastic Search server
327 stars 88 forks source link

Add option to save documents by merging them with existing ones #82

Open diegov opened 4 years ago

diegov commented 4 years ago

This adds a new boolean option called ELASTICSEARCH_MERGE

When set to True documents are saved using an update. The item is sent both as the partial doc for the update, as well as the upsert which is used when the element doesn't already exist.

This makes it easier to split crawling tasks that involve multiple requests for the same item, without having to pass the item along in the meta dict for the next request. It also allows different crawlers to contribute to the same document in separate crawl runs.