rsl / stringex

Some [hopefully] useful extensions to Ruby’s String class. It is made up of three libraries: ActsAsUrl [permalink solution with better character translation], Unidecoder [Unicode to Ascii transliteration], and StringExtensions [miscellaneous helper methods for the String class].
MIT License
984 stars 158 forks source link

Multiple dollar signs or non-price dollar signs pass through in to_url #197

Open abachman opened 7 years ago

abachman commented 7 years ago

We're using stringex 2.7.1 in a Rails project and noticed that dollar signs come through with the acts_as_url plugin and .to_url string method if they're multiples or not preceding numbers.

Two examples, one with $ signs and one with a list of reserved URI characters.

First, variations on the $ sign:

"1$ item".to_url    # => "1$-item"
"$1 item".to_url    # => "1-dollars-item"
"$1$ item".to_url   # => "$1$-item"
"$$1 item".to_url   # => "$$1-item"
"$1 $item".to_url   # => "1-dollars-$item"
"$ 1 item".to_url   # => "dollars-1-item"
"1 $ item".to_url   # => "1-dollars-item"
"1 $$ item".to_url  # => "1-$$-item"
"1 $$$ item".to_url # => "1-$$$-item"

Second, with a list of reserved characters taken from the URI RFC:

irb> %w(: / ?  #  [  ]  @ !  $  &  '  (  )  *  +  ,  ;  =).each {|c| puts "a #{c}#{c} b".to_url }
a-b
a-slash-slash-b
a-b
a-number-number-b
a-b
a-b
a-at-at-b
a-b
a-$$-b
a-and-and-b
a-b
a-b
a-b
a-star-star-b
a-plus-plus-b
a-b
a-b
a-equals-equals-b

Not sure if it's intentional but it breaks some of our routes that have been constraining the parameter provided by acts_as_url with /[0-9A-Za-z\-\.]+/, which worked fine up until this point.

If it's not a bug, can you recommend a better constraint or the set of characters that are allowed through .to_url?

Thanks!

FGoessler commented 1 year ago

We also just ran into this case when generating slugs using stringex. Our case was $100M Offers NOT being transformed into the expected 100-dollars-m-offers or 100m-dollars-offers.

Our workaround, for now, is to call ActiveSupport's .parameterize after calling .to_url to at least ensure any non-url safe symbols like $ are stripped from generated slug.

Would be lovely if a proper fix for this could be done here. But looking at the code also seemed non-trivial assuming you want a bit more logic than just stripping it fully.