scrapy / scurl

Performance-focused replacement for Python urllib
Apache License 2.0
21 stars 6 forks source link

Make sure that canonicalize_url is not different from that of w3lib #30

Closed malloxpb closed 6 years ago

malloxpb commented 6 years ago

Right now, canonicalize_url lowercase all the letters in the path of the canonicalized urls. Therefore, we will need to work on keeping all the letters the way the urls were before canonicalizing them. We can change GURL source code for this

lopuhin commented 6 years ago

The goal here is to make it 100% compatible with canonicalize_url from w3lib: this is required for scrapy integration, or else this will be a backwards incompatible change. Besides making it compatible now, it's important that we know if this breaks. We discussed this with @kmike and it seems that this can be achieved in the following way:

malloxpb commented 6 years ago

This has been resolved in scrapy/w3lib#110