seomoz / url-cpp

C++ bindings for url parsing and sanitization
MIT License
19 stars 11 forks source link

Punycoding #13

Closed dlecocq closed 8 years ago

dlecocq commented 8 years ago

This is support for doing punycoding, which is separate from doing the punycoding for international domain names. I have done essentially nothing to look at how fast this is, and it is slow. However, 1) it's a port of the algorithm provided by the punycode RFC and is a working version, and 2) we optimize the common case and URLs requiring punycoding are pretty rare (in a sample of 7.75M URLs, fewer than 1000 did).

This also doesn't have complete code coverage and I need to do some smoke testing to try to find cases that exercise the bits that lack coverage. Also, the 'overflow' detection needs a critical eye. I'm not exactly sure what was meant in the RFC and I was just hacking on it at the time and blah blah blah. Point is, please scrutinize this one heavily.

@b4hand @tammybailey @tanglyh @martin-seomoz

dlecocq commented 8 years ago

Turns out had I just kept reading the RFC a little bit more, they have a C implementation suggestion. I'm not going to take it verbatim (it doesn't throw, obviously, and needs some interface tweaking), but there are some points that I want to go back and check. Hope to push updates shortly.

dlecocq commented 8 years ago

Finally figured out the coverage issue. Will merge when it passes.