istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.17k stars 323 forks source link

Future of the project #235

Open demisx opened 4 years ago

demisx commented 4 years ago

Hi. I've just come across this project and it is exactly what we need. However, I've noticed there haven't been any updates for a while now. Could you guys please share your vision for this project? Is it still being maintained? Thank you very much.

inthevortex commented 4 years ago

Hi, I am also looking at this kind of an implementation. Would like to know if the dev branch with python 3 support will get merged into the main branch and what is the future of the project?

joeharrison714 commented 3 years ago

I too am interested in this

inthevortex commented 3 years ago

@demisx @joeharrison714 how do you suppose, we can go about it?

demisx commented 3 years ago

@demisx @joeharrison714 how do you suppose, we can go about it?

Unfortunately, I have neither skills, nor time to participate in this project. I guess I'd be hoping it's brought back to life at some point. Like the idea.

damienkilgannon commented 3 years ago

I am fairly sure this project is still alive. But it is like every opensource project reliant on the generosity of contributors to grow and develop.

If you have specific ideas on ways to improve and advance the project it could worth starting a discussion on them here, a release plan could then be created and a shout out for help on each of the defined items could be made.

psdon commented 3 years ago

This project need active maintainers

inthevortex commented 3 years ago

Yes, active maintainers are required. Even I am up for it but I don't know whether I fully understand all the components yet.

madisonb commented 3 years ago

I hear you in that the public facing project has become dormant, I will work on improving this and bringing the project back up to speed. @damienkilgannon is correct in that most of the users of this project have moved towards close sourced implementations of spiders - the open source framework will always remain here.

If you have immediate needs I am always available via Gitter.

madisonb commented 3 years ago

@demisx @inthevortex to answer one of your initial questions directly - I am not convinced there has been enough work on scrapy cluster 1.3 to warrant a full release and merge dev into master. The changes made to the project are not substantial enough in my mind and are mostly focused around maintenance or forced upgrades (packages, python versions, etc).

The goal of the open source project was to provide a foundational framework for folks to build custom implementations and spiders off of. I think we continue to achieve that given the 200+ forks of this repo and the small community that continues to use this project. The plugin system for both the kafka and redis monitor work much in the same way that the scrapy plugin system works, and those simple building blocks + restful interactions help make this project scale.

The team behind the scenes of this project continues to use and work on it behind closed doors for custom implementations or other data collection needs. When we decide something is generic or worthwhile enough to open source, it is brought to the forefront here. But, most of the work is truly around implementation, not radically altering the project's core concepts.

Given that the focus is on usage over features, nature of this project appears static or dormant. However, the team continues to look at and accept PRs that enhance its functionality from outside contributors. We certainly welcome additions from real world use cases and continue to evaluate our own to see if they are generic enough to be applied to the masses.

The dev branch has been cleaned up substantially, with recent (passing!) CI builds, updated README and rtfd docs, and streamlined docker support.

If this comment satisfies your initial question, please close this ticket.