huyha85 / opengraph_parser

Simple Ruby Parser library for parsing OpenGraph protocol (http://ogp.me)
MIT License
59 stars 27 forks source link

Handle URLs with anchors #8

Open jhass opened 10 years ago

jhass commented 10 years ago

Anchor tags are a valid part of an URI but shouldn't be included in the request.

url = "http://www.cnet.com/news/pi-top-the-3d-printable-raspberry-pi-laptop-anyone-can-build/#ftag=CAD590a51e"

require 'uri'
require 'open-uri'
require 'nokogiri'

p Nokogiri::HTML(open URI.parse(url), &:read).css('title').text #=> "Pi-Top: The 3D-printable Raspberry Pi laptop anyone can build - CNET"

require 'opengraph_parser'
p OpenGraph.new(url).title #=> "Page Not Found (404) - CNET"
jhass commented 9 years ago

To pinpoint the issue: The URI.escape on https://github.com/huyha85/opengraph_parser/blob/master/lib/redirect_follower.rb#L21 is causing it. There's no commit introducing it that explains why it is necessary, it was right there with https://github.com/huyha85/opengraph_parser/commit/f602681e8184fbbcb05db7f1f324efb5dc6bc1a8.

@huyha85 would you mind explaining why it's there?

julien51 commented 8 years ago

We have the same issue... Seems like an easy fix?

jhass commented 8 years ago

I ended up writing my own gem due to issues with this one.