technoweenie / guillotine

URL shortening hobby kit
http://techno-weenie.net/guillotine/
MIT License
486 stars 54 forks source link

Doesn't work on ruby 1.8.x #3

Closed technoweenie closed 12 years ago

technoweenie commented 12 years ago

It has to do with the url shorten code:

>> Digest::MD5.hexdigest(url).to_i(16)
=> 191415658344158766168031473277922803570
>> [Digest::MD5.hexdigest(url).to_i(16)].pack("N")
RangeError: bignum too big to convert into `unsigned long'
    from (irb):6:in `pack'

The most common url shortening algorithms involve mapping a unique incrementing ID to some custom encoding:

How to convert the id to a shortened URL:

  • Think of an alphabet you want to use. In your case that's [a-zA-Z0-9]. It contains 62 letters.
  • Take the auto-generated unique numerical key (auto-incremented id): for example 125 (a decimal number)
  • Now you have to convert the 125 (base 10) to X (base 62). This will then be {2}{1} (2×62+1=125).
  • Now map the symbols {2} and {1} to your alphabet. Say {0} = 'a', {25} = 'z' and so on. We will have {2} = 'c' and {1} = 'b'. So '/cb' will be your shortened URL.

http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener

Riak has no way to get this though. I have some thoughts on doing host-based counters, but it's way out of scope. Maybe if someone gets really bored...

pfhayes commented 12 years ago

In Ruby 1.8.x, pack fails if the value is too large - but you can get the same behaviour by only considering the low order bits, by changing

[Digest::MD5.hexdigest(url).to_i(16)].pack("N")

to

[Digest::MD5.hexdigest(url).to_i(16) % 2**32].pack("N")

This does not get you the unique incrementing ID behaviour, but you do at least get behaviour that agrees with the current behaviour in 1.9

technoweenie commented 12 years ago

Use the length/charset options. I have some ideas on refactoring the shortener code so you could plug your own scheme in if needed.

I'd really like DBs with auto increment counters to use that somehow. That might have to be 2 queries though (one to insert and get the ID, and another to set the code).