sferik / twitter-ruby

A Ruby interface to the Twitter API.
http://www.rubydoc.info/gems/twitter
MIT License
4.58k stars 1.31k forks source link

Use heuristic_parse for untrusted URLs. #976

Closed benubois closed 1 year ago

benubois commented 4 years ago

Hello,

Thanks for this gem!

Addressable::URI.parse will throw exceptions for URLs it thinks are invalid. The issue with this is that Addressable and Twitter do not agree on what qualifies as a valid URL. So a tweet can contain URL entities that Addressable believes are invalid. Addressable::URI.heuristic_parse is Addressable's more lenient parser.

This will make it so any tainted URLs are parsed with heuristic_parse. This way there is less of a chance of encountering an Addressable::URI::InvalidURIError exception in the wild.

For example the tweet below contains the URL http://suspicio\\.us/URL". Which Twitter recognizes as a URL so it shows up as an entity.

This throws an exception when calling tweet.urls.first.expanded_url.

require "addressable"
Addressable::URI.parse("http://suspicio\\.us/URL")
Traceback (most recent call last):
        1: from (irb):7
Addressable::URI::InvalidURIError (Invalid character in host: 'suspicio\.us')

vs

require "addressable"
Addressable::URI.heuristic_parse("http://suspicio\\.us/URL")
=> #<Addressable::URI:0x3fc6cfc76c78 URI:http://suspicio/.us/URL>

Incidentally, it looks like Addressable was first used to help with this same type of issue: #487.

This should also fix #742 and #891.

{:created_at=>"Fri Aug 07 16:06:51 +0000 2020",
 :id=>1291767772754726914,
 :id_str=>"1291767772754726914",
 :full_text=>
  "curl -s -D- https://t.co/j30q2zQoYS |grep -iE \"^Location: |URL=|window.location|document.location\" # Try to check where a URL may redirect you.",
 :truncated=>false,
 :display_text_range=>[0, 143],
 :entities=>
  {:hashtags=>[],
   :symbols=>[],
   :user_mentions=>[],
   :urls=>
    [{:url=>"https://t.co/j30q2zQoYS",
      :expanded_url=>"http://suspicio\\.us/URL",
      :display_url=>"suspicio\\.us/URL",
      :indices=>[12, 35]}]},
 :source=>
  "<a href=\"http://suso.suso.org/xulu/Command_Line_Magic\" rel=\"nofollow\">CLI Magic poster</a>",
 :in_reply_to_status_id=>nil,
 :in_reply_to_status_id_str=>nil,
 :in_reply_to_user_id=>nil,
 :in_reply_to_user_id_str=>nil,
 :in_reply_to_screen_name=>nil,
 :user=>
  {:id=>91333167,
   :id_str=>"91333167",
   :name=>"Command Line Magic",
   :screen_name=>"climagic",
   :location=>"BASHLAND",
   :description=>
    "Cool Unix/Linux Command Line tricks you can use in $TWITTER_CHAR_LIMIT characters or less. Here mostly to inspire all to try more. Read docs first, run later.\\~",
   :url=>"https://t.co/eKoQFEZTLs",
   :entities=>
    {:url=>
      {:urls=>
        [{:url=>"https://t.co/eKoQFEZTLs",
          :expanded_url=>"http://www.climagic.org/",
          :display_url=>"climagic.org",
          :indices=>[0, 23]}]},
     :description=>{:urls=>[]}},
   :protected=>false,
   :followers_count=>198274,
   :friends_count=>12330,
   :listed_count=>3962,
   :created_at=>"Fri Nov 20 12:49:35 +0000 2009",
   :favourites_count=>1748,
   :utc_offset=>nil,
   :time_zone=>nil,
   :geo_enabled=>true,
   :verified=>false,
   :statuses_count=>13134,
   :lang=>nil,
   :contributors_enabled=>false,
   :is_translator=>false,
   :is_translation_enabled=>false,
   :profile_background_color=>"C0DEED",
   :profile_background_image_url=>
    "http://abs.twimg.com/images/themes/theme1/bg.png",
   :profile_background_image_url_https=>
    "https://abs.twimg.com/images/themes/theme1/bg.png",
   :profile_background_tile=>true,
   :profile_image_url=>
    "http://pbs.twimg.com/profile_images/535876218/climagic-icon_normal.png",
   :profile_image_url_https=>
    "https://pbs.twimg.com/profile_images/535876218/climagic-icon_normal.png",
   :profile_link_color=>"0084B4",
   :profile_sidebar_border_color=>"C0DEED",
   :profile_sidebar_fill_color=>"DDEEF6",
   :profile_text_color=>"333333",
   :profile_use_background_image=>true,
   :has_extended_profile=>false,
   :default_profile=>false,
   :default_profile_image=>false,
   :following=>false,
   :follow_request_sent=>false,
   :notifications=>false,
   :translator_type=>"none"},
 :geo=>nil,
 :coordinates=>nil,
 :place=>nil,
 :contributors=>nil,
 :is_quote_status=>false,
 :retweet_count=>35,
 :favorite_count=>128,
 :favorited=>false,
 :retweeted=>false,
 :possibly_sensitive=>false,
 :lang=>"en",
 :text=>
  "curl -s -D- https://t.co/j30q2zQoYS |grep -iE \"^Location: |URL=|window.location|document.location\" # Try to check where a URL may redirect you."}
dentarg commented 1 year ago

sferik force-pushed the master branch 4 times, most recently from c5e4814 to ccba161 yesterday

@sferik Was this reverted from master due to the force pushes? https://github.com/sferik/twitter/commit/5e090937a7953c7bd74dcf71f8377eb52daf80cf says "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."