Closed ChengkaiYang2022 closed 5 years ago
Maybe we can simplified this problem by using a dict,like {'pA':['Item1','Item2'],'pB':['Item1'],'pC':['Item1','Item2'],'pD':['Item1']} in file settings.py,and check this dict in all pipelines from pA to pD.But this is still a question when using scrapy cluster.
Any help will be appreciate:)
This is mostly a Scrapy issue, not a scrapy cluster issue (since you will face the same problem using both projects). I am not a heavy user of Scrapy's Item Pipeline, as we mostly use it for transforming into json and moving it out of Scrapy and into a different pipeline framework (like Storm, Heron, NiFi, Flink, etc).
The way you are doing it would be what I would probably do, but in reality I would move my more complex item processing logic out of Scrapy.
Closing
So,here is the case: I have 4 Pipelines,called pA,pB,pC,pD and two types of item called Item1,Item2. Item1 should be processed by pA,pB,pC,pD. Item2 should only be processed by pA and pC. Of cause I have to set the ITEM_PIPELINES={'pA':1,'pB':2,'pC':98,'pD':99} but in this way the Item2 will be processed by pB and pD,and this is wrong. So in the method "process_item" in pB,pD,I will check the type of item(type(item).name ),if it is Item1,it will be processed,if it is Item2,it will not.