scrapy / scurl

Performance-focused replacement for Python urllib
Apache License 2.0
21 stars 6 forks source link

GURL fails to handle some urls #50

Closed malloxpb closed 6 years ago

malloxpb commented 6 years ago

GURL container will mark those urls such as http:/// as invalid. However, since we are only using the Parsing functions from Chromium source (https://github.com/scrapy/scurl/blob/master/scurl/cgurl.pyx#L82), we haven't marked those urls as invalid. There might be some potential issues if we dont fix this :)

malloxpb commented 6 years ago

The solution to this problem is making sure that all of the calls to the parsing functions in GURL (https://github.com/scrapy/scurl/blob/master/scurl/cgurl.pyx#L82) are successful (based on how they do it in gurl.cc, which can be seen here