istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.17k stars 323 forks source link

Upgrade the project to python 3.10 #267

Open borisjota opened 11 months ago

borisjota commented 11 months ago

Upgrade the project to python 3.10

madisonb commented 11 months ago

@borisjota Thank you for this! I'm going to ping the team to review this. A couple of questions from me:

  1. Why define a new decode_dict() when we still have Scrapy's to_dict() method?
  2. I see CircleCI has blocked the test pipeline from running, can you confirm for me that all the tests pass in your local environment?
  3. Why only upgrade to Scrapy 2.6.2 and not 2.10?

Thanks again for your contribution 😄

borisjota commented 11 months ago

Hello @madisonb

  1. I define a new decode_dict() because byte data exist in the headers and these are not serializers.

  2. I check the CircleCI and their is a build error because i deny access to the push repository in Docker Hup.

  3. ok, update scrapy library to 2.10.0. 😄

greetings.

BeamoINT commented 11 months ago

Have you tried out Scrapy with these new changes, I did and it seems that the imports are not compatible with Python 3.10. I get this error when running this command.

root@crawler:~/scrapy-cluster/crawler# scrapy runspider crawling/spiders/link_spider.py Usage

scrapy runspider [options] runspider: error: Unable to load 'crawling/spiders/link_spider.py': attempted relative import with no known parent package

borisjota commented 11 months ago

Have you tried out Scrapy with these new changes, I did and it seems that the imports are not compatible with Python 3.10. I get this error when running this command.

root@crawler:~/scrapy-cluster/crawler# scrapy runspider crawling/spiders/link_spider.py

Usage scrapy runspider [options] runspider: error: Unable to load 'crawling/spiders/link_spider.py': attempted relative import with no known parent package

run the spider with command: scrapy crawl link