rmax / scrapy-redis

Redis-based components for Scrapy.
http://scrapy-redis.readthedocs.io
MIT License
5.54k stars 1.59k forks source link

[rollback] Rollback `request_fingerprint` #278

Closed LuckyPigeon closed 1 year ago

LuckyPigeon commented 1 year ago

Description

request_fingerprint is about to deprecate, but the new fingerprint has a different implementation. So I think we should rollback to the request_fingerprint for the moment, and implement our own request_fingerprint later, follow the old request_fingerprint settings.

Fixes #275

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Test Configuration:

Checklist:

devfox-se commented 1 year ago

How's the new implementation not sufficient for scrapy-redis?

LuckyPigeon commented 1 year ago

First of all, new implementation returns bytes instead of string. Second, I hadn't checked what's the difference between new and old implementations. To protect the master branch, I had to rollback the request_fingerprint for the moment.

LuckyPigeon commented 1 year ago

Now, I've checked the implementation and make a new PR, please refer to #280. The new fingerprint implementation from Scrapy contains a lot of checks and proccesses for headers and keep_fragments, which isn't what we need, we only need to process request and generate fingerprint, thus, I think we should implement our own fingerprint function, a simple one would be just fine.

You still can pass request into new fingerprint then add hex() at the result of fingerprint to get a string result at your own implementation, in that case, you can pass headers and keep_fragments as your parameters as well.