vipulgupta2048 / scrape

Mission - scraping the planet, one website at a time
MIT License
10 stars 13 forks source link

[Major] Pipelines usage changed. #46

Closed atb00ker closed 6 years ago

atb00ker commented 6 years ago

Problem: The default pipeline class of scrapy has limitation, specifically the instance cannot be accessed from inside the spider.

Solution: In order to tackle that problem i have made a new class, which acts as the pipeline class but has the instance stored in the spider.

Why am i tagged for review? Because moving out from the default class changes the functionality of the pipelines, hence, changes to all the spiders is made and everyone involved needs to understand the changes to implement the currently available APIs and the APIs of this class that will be made in the future!

Recommended Action: Read your spider's code and pipelines code.

atb00ker commented 6 years ago

@omi10859 , @anujagrazzel; Have a look at this PR! :)

parthsharma2 commented 6 years ago

I think you need to add a yield before each self.postgres.process_item(item, self.name) in each spider.

atb00ker commented 6 years ago

Update: Using a spider variable for solving the instance issue, no change is needed in the spiders, everything is ensured by pipelines. Action Needed: Please read the pipelines file, It looks good to me and all the spiders run, but a second eye on major changes is always welcomed! :) @parthsharma2, @thisisayush, @vipulgupta2048, @anujagrazzel, @omi10859