scrapy / w3lib

Python library of web-related functions
BSD 3-Clause "New" or "Revised" License
392 stars 104 forks source link

Make safe_url_string safer #201

Closed Gallaecio closed 1 year ago

Gallaecio commented 2 years ago

Tasks:

codecov[bot] commented 2 years ago

Codecov Report

Merging #201 (88f32ae) into master (fb70566) will decrease coverage by 0.99%. The diff coverage is 94.48%.

@@            Coverage Diff             @@
##           master     #201      +/-   ##
==========================================
- Coverage   95.98%   94.98%   -1.00%     
==========================================
  Files           6       14       +8     
  Lines         473     1377     +904     
  Branches       90      333     +243     
==========================================
+ Hits          454     1308     +854     
- Misses          9       40      +31     
- Partials       10       29      +19     
Impacted Files Coverage Δ
w3lib/_utr46.py 75.73% <75.73%> (ø)
w3lib/_encoding.py 88.23% <88.23%> (ø)
w3lib/_url.py 98.00% <98.00%> (ø)
w3lib/_infra.py 100.00% <100.00%> (ø)
w3lib/_rfc2396.py 100.00% <100.00%> (ø)
w3lib/_rfc3986.py 100.00% <100.00%> (ø)
w3lib/_rfc5892.py 100.00% <100.00%> (ø)
w3lib/_util.py 100.00% <100.00%> (ø)
w3lib/url.py 98.69% <100.00%> (+0.06%) :arrow_up:
Gallaecio commented 2 years ago

The new implementation is currently up to 40 times slower than the previous implementation for test cases where both have the same, non-error outcome.

@kmike Shall I look into using Cython here?

Gallaecio commented 1 year ago

I have opened https://github.com/scrapy/w3lib/pull/203 as a simpler approach that is good enough to address #80 and similar issues without any major rework of safe_url_string.

I am closing this pull request, as I am stopping work on it. But I have created https://github.com/scrapy/w3lib/issues/204 so that someone, not necessarily me, can take over and finish the work.