Closed malloxpb closed 6 years ago
Merging #42 into master will decrease coverage by
0.21%
. The diff coverage is61.86%
.
@@ Coverage Diff @@
## master #42 +/- ##
==========================================
- Coverage 61.58% 61.37% -0.22%
==========================================
Files 2 2
Lines 315 321 +6
==========================================
+ Hits 194 197 +3
- Misses 121 124 +3
Impacted Files | Coverage Δ | |
---|---|---|
scurl/canonicalize.pyx | 19.46% <12.5%> (-1.96%) |
:arrow_down: |
scurl/cgurl.pyx | 84.13% <74.46%> (+4.41%) |
:arrow_up: |
Hey @lopuhin , in this PR, I have successfully increased the performance of urljoin
and canonicalize_url
by doing the folowing:
urljoin
to_native_str
call in canonicalize_url
since we know the types of variablescanonicalize_component
from chromium instead of calling quote
from urllib.parse. These have brought significant increase to the performance of canonicalize_url
as well as urljoin
. In particular, the rate of link extraction from canonicalize_url
increased from 31k links/sec to 44k links/sec. But let me know if there's anything that concerns you :)
For some reason the cgurl.cpp is conflicted with the one from master branch...
Hey @lopuhin , I have cleaned up the code even further by moving all the canonicalize code to canonocalize.pyx
:) I think this PR is ready!
Hey @lopuhin , yeah sorry about that... I did not notice yesterday 😄 I just fixed it and the build is green now
This PR aims to improve the performance of SCURL