seomoz / url-cpp

C++ bindings for url parsing and sanitization
MIT License
19 stars 11 forks source link

URL Parsing #2

Closed dlecocq closed 8 years ago

dlecocq commented 8 years ago

This is essentially a translation of urlparse.py

@b4hand @martin-seomoz @seomoz/bigdata

dlecocq commented 8 years ago

In case it's not painfully obvious, I'm C++ rusty. Please feel free to tear this apart. With that in mind, while building up functionality, the focus will be on correctness and cleanliness. Only once we've gotten it "correct" will we focus on improving performance (though it's already like 15-20x faster than url-py).

tammybailey commented 8 years ago

The linkscape code standards are here: https://github.com/seomoz/linkscape/blob/master/CodingStandardCPP.md. If you want to make the PR easier on yourself ;)

dlecocq commented 8 years ago

PTAL -- I think I addressed most of the comments, but I imagine this first one will take a few rounds.

dlecocq commented 8 years ago

@b4hand -- I updated this to not do any unnecessary copies / reassignments in the constructor and as you might expect, performance got better. That said, in general, the porting (and tests) have been the main focus in the work done so far (including PRs not yet submitted).

In a benchmark on my vagrant machine, parsing the string "http://moz.com/some/kind/../of/weird/path;with=parameters????????and=a&&&query=value&&#fragment bar" (and provided as a command-line argument to benchmark) 1M times:

dlecocq commented 8 years ago

@tammybailey -- is this more in line with our C++ conventions?

b4hand commented 8 years ago

LGTM