seomoz / url-cpp

C++ bindings for url parsing and sanitization
MIT License
19 stars 11 forks source link

Don't aggressively replace ?'s in the query #6

Closed dlecocq closed 8 years ago

dlecocq commented 8 years ago

@b4hand @vadim-moz @tanglyh

It came up first in #3 , but it turns out the current implementation of url-py does not so aggressively remove ? characters from the query. In a test of URLs taken from the wild, this accounted for many of the differences between url-py and url-cpp.

b4hand commented 8 years ago

LGTM. If the long term plan is to support all the canonicalization that Mozscape does, there's probably going to have to be some changes to url-py as well.

dlecocq commented 8 years ago

Yeah, I think we should view URLs through the same lens. I figure it's probably best to match url-py's current behavior as closely as possible, and only then make changes to match mozscape. But agreed -- in the end, I want all interpretations to match.